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REPORTS  SUBMITTED 


The  following  reports  were  prepared  in  partial  fulfillment  of  contract 
DA49-092-ARO- 130  requirements  during  the  course  of  the  Geographic  Distribu¬ 
tion  of  Infectious  Disease  Project 

(1)  28  Jan  1966,  Contractor  Quarterly  (1st)  Progress  Report  (15  Nov.  65  - 
1  Feb.  66),  3  pp. 

(2)  1  May  1966,  Contractor  (1st)  Semi-Annual  Progress  Report  (15  Nov.  65  - 
1  May  66) ,  6  pp . 

(3)  8  Aug.  1966,  Contractor  Quarterly  (3rd)  Progress  Report  (1  May  66  - 
1  Aug.  66),  5  pp. 

(4)  15  Dec.  1966,  Contractor  (1st)  Annual  Progress  Report  (15  Nov,  65  - 

14  Nov.  66),  102  pp. 

(5)  3  Mar  1967  ,  Contractor  Quarterly  (5th)  Progress  Report  (15  Nov.  66  - 

15  Feb.  67),  6  pp. 

(6)  15  May  1967,  Contractor  (2nd)  Semi-Annual  Progress  Report  (15  Ncv.  66 
15  May  u7,  76  pp. 

(7)  31  Aug.  1968,  Contractor  FINAL  RE POL  (15  Nov.  65  -  31  Aug.  1968), 

pp.  430. 


In  addition  to  the  seven  reports  listed  above  (for  external  use)  there 
were  six  other  reports  prepared  for  internal  use:  Four  of  these  were  pre¬ 
pared  by  Planning  Research  Corporation  in  fulfillment  of  subcontractor  re¬ 
quirements.  and  two  were  prepared  by  members  of  our  own  staff:  Capr  Roger 
J.  Cuffey  (AFIP)  and  Mr.  G.  G.  Gullett  (lT  v.  III.). 


The  final  report,  i.e.,  this  volume,  incorporates  all  significant 
(current)  Information  that  has  been  presented  in  all  previous  reports  — 
internal  as  well  as  external  ones.  (The  internal  report  daiing  with  pub¬ 
lished  maps  that  relate  to  ecology  of  disease,  by  Mr.  G,  G  Gullett,  is  re¬ 
produced  in  its  entirety.) 
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OBJECTIVES 


Present  objectives  of  the  MOD  system,  in  the  context  of  nearly  three 
years  experience  with  this  program  are  presented  in  the  Introduct ion  (1.2). 
But  there  would  be  advantage  in  presenting  our  objectives  at  the  time  the 
MOD  prcje--  ''  —  an.  These  objective-  —  "-* •  'W*--  :'=*'^r,nhc 

contained  in  our  original  application,  dated  15  June  1965  They  are  re¬ 
produced  here  verbatum. 


*  *  * 


A.  OBJECTIVE  OF  THE  PROGRAM  The  ultimate  objective  of  the  program  is  to 
develop  research  methodology  by  means  of  which  the  occurrence  of  a  particular 
disease  may  be  correlated  with  a  variety  of  sociological,  physical  and  en¬ 
vironmental  factors  such  as  population  density,  races,  ethnic  groups,  alti¬ 
tude,  temperature,  humidity,  character  of  the  soil,  agricultural  products, 
possible  insect  vectors  and  animal  reservoirs  of  disease. 

The  immediate  objective  of  the  program  is  to  provide  data  in  the  form  of 
disease  distribution  maps  and  atlases,  showing  prevalence,  incidence,  and 
severity  of  specific  infectious  diseases  throughout  the  world. 
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B.  TECHNICAL  NEED  FOR  THIS  PROGRAM  Data  on  the  geographic  distribution 
of  infectious  disease  are  of  obvious  importance  in  evaluating  the  disease 
risk  for  groups  of  persons  assigned  to  foreign  posts  and  In  any  detailed 
planning  that  involves  the  socio-economic  problems  of  a  particular  area. 

There  have  been  only  two  major  contributions  in  this  field.  They  are:  (a) 
Geographic  Atlas  of  Disease,  prepared  by  the  American  Geographical  Society, 
published  during  lSoO-53.  (b)  World-Atlas  of  Epidemic  Diseases,  edited  by 
Professor  Ernst  Rodenwaldt  (Heidelberg),  published  in  1952  but  reflecting 
data  gathered  some  years  before.  However,  data  on  some  developing  countries 
of  current  interest  are  either  sparse  or  completely  lacking.  The  methodology 
deve’  ped  by  this  program  would  provide  a  means  for  linking  contributing  and 
precipitating  factors  with  a  given  disease  thereby  providing  clues  to  the 
etiology  of  the  disease  and  suggesting  specific  basic  research  for  methods  of 
control. 


C.  RELEVANCE  TO  ARPA  MISSION  Infectious  diseases  are  the  greatest  cause 

of  morbidity  and  mortality  among  troops  and  civilians  in  time  of  war.  In¬ 
fectious  diseases  also  exert  a  major  influence  on  the  socio-economic  status 
of  all  countries,  especially  developing  countries.  The  proposed  study  would 
provide  valuable  information  on  the  distribution,  prevalence,  and  incidence 
of  infecticws  disease  throughout  the  world,  and  would  provide  a  method  for 
carrying  out  rapid  and  effective  searches  for  important  interrelat ions!  ‘is 
among  a  large  varf'  ,  ■  of  potential  causal  factors  of  ai.  given  disease. 

The  development  of  methods  of  control  of  lnfectl>us  diseases  is  in  accordance 
with  ARPA/AGlLE's  mtrsion  of  conducting  research,  development,  and  tests  of 
techniques  and  equipment  required  by  local  forces  In  remote  area  confict 
situations. 
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PROJECT  PLANS 


To  avoid  redundancy,  we  direct  the  reader  to  the  Preface,  Section  1, 
Introduction  (particularly  tne  Definition  of  Goals,  1.21),  Section  2, 
Technical  Summary,  and  Section  9,  General  Summary,  Conclusions  and  Recom¬ 
mendations,  where  project  plans  are  considered  in  some  detail. 

In  this  statement,  contributing  background  information  auout  th<- 
project,  we  shall  concentrate  on  relationships  among  biomedical  professional 
members  (principally  AF1F  staff)  and  computer  scientist  members  (princi¬ 
pally  Planning  Research  Corporation  Staff)  of  the  MOD  group. 

At  the  outset  it  was  realized  that  (as  3  generalization)  those  who 
understood  disease  ecology  were  not  competent  to  direct  a  computer  pro¬ 
cessing  attack  on  the  prob'etn  and  that,  conversely,  those  who  were  compe¬ 
tent  in  the  area  of  computer  s<~l ence/technology  did  not  understand  disease 
ecology.  We  concluded  that,  since  computers  were  the  means  toward  increased 
understanding  (not  the  end),  biomedical,  scientists  should  lead  in  de  iop- 
roent  of  the  MOD  system,  seeking  computer  orientel  specialists  t'  support 
the  very  important  computer  processing  aspects  of  the.  program. 

Upon,  careful  deliberation,  and  after  it  became  clearly  evident  that 
our  chances  oi  finding  a  top  notch  systems  analyst  for  short  term  hire  or 
the.  open  market  were  small  indeed,  we  decided  to  employ  the  services  of  a 
comnierc*al  group  to  help  us  with  this  highly  critical  aspect  of  our  program 
System  Analysis.  We  met  with  representatives  and  considered  informal 
proposals  from  four  organizations; 

Auerbach  Corporation 
Bunker-Ramo  Corporation 

*  Planning  Research  Corporation 

*  Systems  Research  Croup,  Inc. 

and  received  formal  proposals  : rota  two  of  these*.  In  the  course  of  our 
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deliberations  we  conferred  with  the  AFIP  Automatic  Data  Processing  Section, 
the  National  Bureau  of  Standards,  and  the  University  of  Illinois,  gettirg 
advice  and  counsel  in  regard  to  the  system  which  would  be  required  for  our 
omputer-based  automatic  mapping  program. 

On  5  July  1966,  Subcont  act  UARFP  66.1,  awarded  to  Planning  Research 
Corn.  (PRC),  became  active.  A  critical  factor  in  selecting  this  company 
was  their  agreement  to  assign  two  highly  motivated  and  extraordinarily  well 
qualified  persons,  full  time,  to  our  project.  (Their  very  efficient  work 
and  the  full  cooperation  of  their  associates  at  PRC  more  than  justified 
our  decision.) 

The  initial  study  uncovered  many  complex  problems  which  were  more 
serious  than  could  have  been  anticipated  before  this  study,  and  a  second 
subcontract  was  let  to  PRC  in  August  1966  in  order  to  complete  analyses  of 
these  problems.  Joint  effort  and  close  cooperation  among  the  biomedical 
professionals  of  AFIP,  and  UAREP,  and  the  data-processing  professionals  from 
PRC  (and  also  one  employed  by  UAREP)  resulted  in  increased  understanding  of 
and  tentative  solutions  to  most  of  the  problems  which  had  been  raised  earlie 
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This  letter ,  photo-copied  in 
/f  its  original  form ,  explains 
the  composition ,  functions 

and  conclusions  of  the  j  7  September  1967 

Advisory  Cormittee. 


MEMORANDUM  TO:  Dr.  Howard  Hopps 

Chief,  Division  of  Geographic  Pathology 
Armed  Forces  Institute  of  Pathology 
Washington,  D.  C.  20305 

SUBJECT:  Mapping  of  Disease  Advisory  Committee  -  Summary  Report 


I.  Introduction 


Following  the  recommendation  of  Dr.  Herbert  Pollack  of  the  Institute 
for  Defense  Analyses  and  the  concurrence  of  appropriate  officials  at 
Advanced  Projects  Research  Agency  and  Armed  Forces  Institute  of  Pathology 
an  advisory  committee  was  formed  for  the  Mapping  of  Disease  (MOD)  Program 
being  conducted  by  the  Geographic  Pathology  Division,  AFIP.  The  member¬ 
ship  of  this  committee  i s  listed  below: 


Mr.  Fred  I.  Edwards,  ARPA  ’ 

Mr.  Ronald  Finkler,  IDA 

Dr.  Allan  L.  Forbes,  ARO 

Mr.  Joseph  E.  Hinds,  Consultant 

Dr.  Howard  Hopps,  AFIP 

Dr.  Myles  Maxfield,  Consultant 

Dr.  Donald  R.  Sheldon,  IDA- 

!  - 

The  terms  of  reference  for  this  committee  were  to  clarify  over-all 
governmental  requirements  relating  to  geographic  aspects  of  disease  and 
to  assist  the  project  director  in  assuring  optimal  responsiveness  toward 
reaching  these  goals  during  the  anticipated!  remaining  1.5  years  of  the 
original  program  plan.  | 

Meetings  of  the  Advisory  Committee  &ere  held  on  24  May  and  13  June 
1967  at  the  Institute  for  Defense  Analysis.  In  addition,  Mr.  Finkler 
and  Dr.  Sheldon  held  separate  meetings  with  Dr.  Hopps  and  his  staff  on 
behalf  of  the  committee.  Their  findings  and  recommendations  have  been 
incorporated  in  the  body  of  this  report. 

At  the  outset  it  was  envisioned  that  the  Advisory  Committee  would 
play  a  continuing  role  in  the  development  of  the  MOD  program.  However, 
as  unexpected  and  irrevocable  budgetary  restraints  have  resulted  in 
termination  of  funding  for  the  program  at  the  end  of  current  obligations 
(November  1967),  there  seemed  little  need  for  continuing  activities  and 
this  report  is  intended  as  final. 
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II.  Findings 

A.  General  Requirements: 

1.0  The  requirement  for  a  greater  understanding  of  the  global  and 
geographic  aspects  of  disease  extends  throughout  the  official  community 
and  includes  not  only  public  health  and  military  interests  but  also 
those  associated  with  international  development  and  foreign  relations. 

2.0  The  scope  of  these  interests  range  from  basic  considerations 
such  as  the  etiology  of  infectious  diseases  to  estimative  or  evaluative 
measurements  of  past  or  future  events.  Even  such  relatively  gross 
measures  as  calculation  of  current  incidence  cf  infectious  disease  on 
a  worldwide  basis  fro.,  "rude  data  could  benefit  from  appropriate 
application  of  computer  technology. 

3.0  A  complete  response  to  the  total  dimensions  of  the  problem 
as  defined  above  is  clearly  beyond  the  capability  cf  monetary  and 
personnel  assets  allocated  to  the  MOD  Program  even  if  support  had  been 
continued  at  the  projected  vate.  However,  one  application  of  computer 
technology  which  wou.l  be  '  isic  to  all  aspects  of  the  problem  would  be 
a  broad  based  data  storage  and  retrieval  system.  In  this  regard.,  the 
experience  and  expertise  developed  ny  the  MOD  staff  particularly  in 
Data  Extraction  Format ,  Factors  Cataloging  and  Data  Structure  Vocabulary 
would  be  a  valuable  input . 

3 •  MOD  Program  Review • 

1.0  In  keeping  with  the  initial  f  tiding  proposal  objective,  the 
MOD  Program  concentrated  on  the  design  and  assembly  of  a  system  for 
computerized  mapping  of  disease  information.  The  program  plans  and 
study  methods  appear  reasonable  and  satisfactory  for  this  purpose. 

2.0  While  the  objective  of  providing  an  operational  disease 
mapping  system  will  not  be  met  by  the  conclusion  of  the  present 
contract  (Nov.  67),  much  valuable  information  concerning  how  such  a 
system  might  be  developed  has  been  derived. 

3.0  The  maximal  lasting  benefit  obtainable  from  the  MOD  Program 
during  the  remaining  period  of  support  will  be  the  preparation  of  a 
final  report  of  their  system  analysis  and  design  including  a  detailed 
evaluation  of  the  problems  associated  with  establishing  the  requisite 
data  base  for  operation  of  such  a  system. 

Ill  Recommendation 

The  final  report  of  the  MOD  Study  should  directed  toward  pro¬ 
viding  a  basic  document  which  would  serve  primarily  for  the  indoctrin¬ 
ation  and  guidance  of  any  future  programs  that  might  be  directed  toward 
the  application  of  computer  technology'  to  epidemiological  and  geographic 
aspects  of  disease. 

I 
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To  meet  this  objective  the  following  outline  lists  those  aspects 
of  the  MOD  Study  which  should  be  covered  in  the  final  report: 


1.  An  introductory  statement  and  discussion  of  objectives. 

2.  A  summary  of  activities,  project  plans,  and  point  at  which 
project  had  to  be  terminated. 

3.  A  summary  of  accomplishments,  summary  of  contracts  with 
outside  organizations,  and  a  list  of : all  appropriate 
documentation  and  reference  material. 

4.  The  results  and  discussion  of  the  experiments  in  .contour 
mapping  including  the  systems  used,  their  limitations, 
examples  of  the  outputs,  and  appropriate  ways  dilease 
information  may  be  presented  on  maps. 

5.  The  complete  system  design  specification  including  descrip¬ 
tions  of  all  files  and  file  structures,  program  flow  charts 
and  hardware  configurations. 

6.  A  complete  description  of: 

a)  the  data  base(s)  employed, 

b)  the  Data  Extraction  forms  and  instructions  on  their  use, 

c)  '  the  Factors  Catalog, 

d)  the  Data  Structuring  Vocabulary, 

e)  the  Query  Language  and  other  data, 

f)  a  glossary  of  appropriate  terminology. 

>  t 

The  assembly  of  the  data  and  experiences  of  the  MOD  Project  in  a 
manner  similar  to  that  described  would  constitute  a  worthwhile  contribu¬ 
tion  and  should  be  accepted  by  the  sponsoring  agency  as  satisfactory 
fulfillment  of  contractual  obligations. 


Concurrence  in  Draft, 
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POINT  AT  WHICH  THE  PROJECT  WAS  TERMINATED 


At  the  outset,  the  MOD  project  was  visualized  as  a  three-year  effort, 
and  it  was  planned  that  the  first  two  years  would  be  primarily  concerned 
with  system  analysis  and  system  design.  Implementation  was  planned  for 
the  third  and  final  year.  When  it  was  learned  that  ARPA's  support  of  the 
project  was  to  terminate  at  the  end  of  two  vears,  all  efforts  were  directed 
toward  finishing  data  analysis  and  data  structuring  (method  design),  and 
completing  the  various  aspects  of  the  system  design  phase.  (In  the 
interests  of  efficiency,  we  had  begun  to  implement  some  portions  of  the 
system  so  that  we  could  more  effectively  design  other  portions.)  It  was 
our  feeling,  and  one  supported  by  the  Advisory  Committee,  that  a  completed 
system  design  would  represent  an  important  milestone  and  would  contribute 
information  of  value  to  other  groups  interested  in  comparable  problems  or, 
perhaps,  in  taking  up  where  we  left  off  In  developing  a  system  for  the 
computerized  mapping  of  disease-environmental  data. 

The  system  analysis  and  design  have  both  been  completed.  (There  are 
several  aspects  of  the  system  design  that  will  need  fuither  elaboration, 
but  this  cannot  be  performed  outside  the  context  of  a  partially  implemented 
system.)  In  addition,  we  have  made  an  extensive  analysis  of  data  character¬ 
istics:  sources,  limitation  of  the  daf%  per  se,  .ind  problems  involved  in 
preparing  these  data  for  computer  input.  A  method  for  structuring  data  has 
been  designed  and  tested,  and  a  comprehensive  factor  catalogue  has  been 
produced.  We  have  gained  new  insight  into  the  characteristics  of  disease- 
environmental  data  that  allow  them  to  be  mapped,  and  have  developed  data 
extraction  forms  reflecting  these  requirements. 
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In  essence  then,  the  MOD  project  was  terminated  just  short  of  the 
implementation  phase.  The  figure  below  shows  the  extent  to  which  the  five 
major  component  tasks  were  completed. 


TASKS  -  MOD  | 

ft: 

System  Analysis  and  Dtilgn 

Programming  and  Implementation 

Gists  Source  Acquisition 

lists  Extraction 

Data  Entry  (card  punching) 

KIY:  beoemptiehed 
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FISCAL  DATA 

Support  for  the  biomedical  portion  of  the  MOD  effort  has  been 
provided,  in  large  measure,  by  the  Armed  Forces  Institute  of  Pathology 
since  most  of  the  biomedical  personnel  were  on  the  AFIP  staff.  Over  and 
above  the  contribution  made  by  the  AFIP,  the  total  cumulative  (estimated) 
cost  borne  by  Contract  # DA  49-09 2-ARO-130 ,  was  $167,077.25.  A  major 
portion  of  these  (ARPA)  contrac*-  funds  was  used  to  provide  computer- 
science/ technology  support,  principally  by  subcontract  with  the  Planning 
Research  Corporation.  The  role  of  PRC  has  been  discussed  under  Project 
plans,  and  will  not  be  considered  further  here. 

A  detailed  financial  statement  has  been  prepared  and  submitted  to 
the  appropriate  persons  of  ARO  and  ARPA. 


For  ward 


The  work  described  here  is  the  product  of  a  remarkable  interdiscipli¬ 
nary  effort,  involving  medicine  (human  and  veterinary),  geography,  geology, 
cartography  and  computer  science  technology.  I  am  pleased  to  have  played 
a  part  in  this  important  study,  even  though  my  role  was  a  very  small  one. 

Dr.  Hopps  and  his  associates  are  to  be  congratulated  on  their  ac¬ 
complishment  so  clearly  described  in  this  monograph.  They  have  taken  a 
giant  step  forward  in  adapting  the  technologic  advances  of  information 
theory  and  computer  science  to  the  study  of  disease  ecology.  Their  work 
clearly  and  dramatically  demonstrates  the  feasibility  of  "computerized 
mapping  of  disease-environmental  data",  and  they  have  produced  the  blue 
prints  for  an  effective  system.  Hopefully,  this  work  will  continue,  and 
the  system  which  they  have  designed  with  such  care  will  b«  implemented  so 
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Preface 


The  truth  is  rarely  pure  and  never  simple. 

Oscar  Wilde 


Communication  means  the  perspicuous  transmission  of  information.  As 
with  all  things,  the  means  has  to  be  appropriate  to  the  end.  This  has  been 
an  important  consideration  in  the  MOD  project.  Our  emphasis  on  maps  as  a 
means  of  display  is  because  this  is  the  most  effective  way  to  transmit 
disease/environmental  information  to  most  persons.  Often  a  map  is  the  best 
means  of  presenting  interrelationships  vividly  and  dramatically  — 
especially  to  the  intelligent,  concerned,  non-medically ,  non-mathematically 
oriented  person  who  needs  to  know. 

Maps  allow  quick  and  clear  correlation  and  serve  a  very  important 
need,  even  for  the  medical  expert.  Overlaying  and  visual  pattern  comparing 
is  a  very  powerful  process  because  it  permits  human  detection  of  relation¬ 
ships  so  complex  that  standard  mathematical  methods  may  be  unable  to  de¬ 
tect  them.  The  rapid  production  of  maps  by  computer  gives  an  additional 
great  advantage;  the  process  is  so  fast  that  one  can  get  an  up-t  >-date 
presentation  several  hours  after  his  request. 

The  computer  techniques  which  allow  map  print  out  also  allow  print 
out  of  figures  and  names  to  provide  specific  yes/no/or  qualified  answers, 
lists  of  references,  location  of  things  by  latitude/longitude  coordinates, 
political  areas,  etc.,  correlation  coef i icients,  d.ta  for  construction  of 
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Simply  stated,  the  objective  of  the  MOD  system  is  to  provide  a  means  j 

whereby  the  DISEASE  PANORAMA,  can  be  quickly  and  effectively  presented  in 

. 

map  form  in  a  time  context  which  is  either  current  or  historic.  | 

t 

W  mean  disease  panorama  to  include  location  of 
the  disease  at  a  particular  time  in  terms  of 
prevalence ,  incidence ,  mortality ,  and  morbidity  — 
within  the  population  en_  toto,  also  its  various 
segments.  But  more  than  this:  we  mean  it  to 
include  also  information  as  to  the  quality, 
quantity,  and  location  of  those  numerous 
environmental  factors  which  influence  rate  of 
occurrence  as  well  as  character  of  the  disease. 

In  a  recent  (t  isigned)  article  on  the  role  of  computers,  it  was  said: 

"When  all  the  pertnent  tacts  are  known,  decisions  make  themselves."  This 
may  well  be  true,  but  it  assumes  that  the  pertinent  facts  are  distinct  from 
the  great  mass  of  non-pertinent  facts,  and  that  the  degree  of  their  perti¬ 
nence  is  recognized.  This  brings  us  to  one  of  the  critical  points  in  to¬ 
day's  "Information  explosion".  Data  is  increasing  at  an  exponential  rate 
and  is  inundating  us  because  of  its  enormous  volume.  We  have  by  no  means 
solved  the  problem  of  converting  data  to  information. 

One  of  the  very  important  problems  in  Geographic  Pathology  is  how  to 
handle  the  great  accumulation  of  knowledge  so  that  what  Is  known  shall  be 
available  when  needed  —  and  we  are  talking  about  Inf ormat ion ,  in  contrast 
t0  isolated  data.  Geographic  Pathology  Is  not  alone  in  facing  this  problem, 
but  It  is  particularly  serious  here  because  the  pertinent  information  is 
more  widely  scattered,  and  a  higher  proportion  of  it  appears  in  the  form 
of  conference  proceedings,  annual  reports  from  isolated  medical  centers, 
and  the  like.  In  fact,  much  very  valuable  information  (.stemming  from 
actual  field  experience)  is  not  written  down  at  all,  but  is  nevertheless 
available  if  the  proper  approaches  are  used.  An  inherent  part  of  this 
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data  processing  problem  has  cc  Ho  with  separating  the  pertinent  from  the 
non-pertinent,  differentiating  between  the  true  information  and  the  mis¬ 
information  (which  abounds) ,  all  as  a  prelude  to  converting  isolated  facts 
:o  correlated  INFORMATION.  Although  electronic  data  processors  are  very 
efficient  at  storage  and  retrieval  of  data,  and  can  carry  out  very  complex 
searches  tor  correlates,  these  machines  cannot  do  the  essent'al  selection 
and  preprocessing  of  data  which, among  many  other  things,  includes  a  value 
judgement  as  to  the  validity  of  the  data  which  is  fed  to  them. 

In  the  course  of  our  work,  critics  have  pointed  out  to  us  the  in¬ 
herent  limitations  of  many  of  the  data  that  we  would  like  to  process.  We 
recognize  this  fact  very  well  indeed.  Certainly  there  are  many  places  in 
the  world  where  the  data  base  that  deals  with  many  disease  situations  is 
■sltogether  inadequate  for  any  meaningful  collation,  much  less  effective 
computer  manipulation.  No  system  of  information  processing  can  convert 
bad  data  into  good  data.  However,  there  are  large  pools  of  data  (derived 
from  cultural  anthropologists,  economists,  geologists,  meteorologists, 
agronomists,  epidemiologists,  veterinarians,  pathologists,  etc.),  relating 
to  disease /environment  situations,  which  could  be  meaningfully  collated 
and  effectively  computer  manipulated. 

Many  of  the  most  important  problems  have  the  softest  information, 
but  we  must  Identify  what  information  there  is,  and  learn  its  limitations. 
We  must  work  toward  correcting  deficiencies  in  the  data  base,  but  even  more 
important,  we  must  develop  better  methods  of  using  what  information  is 
available. 

We  are  at  the  stage  of  world  development  where  many  important  judg¬ 
ments  must  be  made  in  the  absence  of  hard  data.  If  we  do  not  use  what  data 
is  available,  what  shall  we  use? 

"Where  attainable  knowledge  could  have  changed  the 
issues,  ignorance  has  the  guilt  of  vice." 

Alfred  North  Whitehead 
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Pi'*  face 


The  prime  objective  of  the  MOi  project  was  to  develop  a  system  whereby 
available  data  could  be  used  most  effectively  to  gain  new  insight  into  the 
multifactorial  causes  of  disease.  This  report  is  an  account  of  how  we  set 
about  to  do  this,  the  many  problems  that  we  encountered  --  and  our  efforts 
to  overcome  these  problems. 


HOWABJ  C.  HOPPS,  M.  D. 
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People  wno  ax 3  interested,  in  data  /  information  systems 
are  of  U'O  general  types:  The  first  is  likely  to  say, 
"I  don't  give  a  damn  for  your  opinion ;  show  me  your 
data.  "  The  second.,  "I'm  not  interested  in  the  details; 
I  want  information.  "  The  system  we  are  describing  in 
this  report  would  satisfy  both  types  of  user. 


"A  t.  esh  instrument  serves  the.  same 
purpose  as  foreign  travel;  it  shows 
things  in  unusual  combinations .  The 
gain  is  more  than  a  mere  addition; 
it  is  a  transformation." 

Alfred  North  Whitehead 
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Introduction 


ABSTRACT  -  This  section  discusses 
multi, factorial  causes  of  infectious 
disease  and  develops  the  concept  of 
disease  ecology.  Against  this  back¬ 
ground,  the  objectives  of  the  MOD 
project  are  presented,  and  the  ways 
and  means  ~>f  realizing  these  objec¬ 
tives.  Particular  advantages  of 
the  map- form,  as  a  means  of  display¬ 
ing  information,  are  discussed. 


One  cannot  stud;,  man  alone  in  relation  to  his 
diseases  because  here  —  as  almost  everywhere 
else  —  man  is  inextricably  bound  to  his 
environment . 
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1.0  GENERAL  CONSIDERATIONS 

We  are  faced  with  the  problem  that  always  arises  when  one  addresses  a 
mixed  audience.  Presumably  this  audience  (those  who  read  this  book)  will 
consist  of  experts,  but  some  will  be  experts  in  the  field  of  medicine, 
or  geography,  or  political  economy,  and  know  relatively  little  about  data 
processing  or  computer  technology;  others,  highly  skilled  in  various 
aspects  of  information  theory  and  communication  science,  will  know  rela¬ 
tively  little  about  medical  geography  or  the  biodvnamics  of  disease. 

We  have  tried  to  reach  an  acceptable  compromise  by  laying  a  ground¬ 
work  of  basic  information  at  the  beginning  of  each  section  before  com¬ 
mencing  the  technical  discussion.  To  familiarize  some  of  the  non-medical ly 
oriented  readers  with  schistosomiasis  and  leptospirosis,  a  brief  discus¬ 
sion  of  each  of  these  diseases  is  given  in  the  appendix.  Since  a  common 
ground  of  understanding  is  dependent  upon  knowing  precisely  what  the 
writer  means  when  he  uses  certain  words,  we  have  provided  a  Glossary 
(also  in  the  Appendix),  and  this  is  divided  into  two  parts:  Part  1 
dealing  with  computer  processing  terms.  Part  2  dealing  with  biomedical 
terms. 


1.1  HOST-PARASITE  RELATIONSHIPS  AND  THE  ECOLOGY  OF  DISEASE 


In  a  general  sense,  the  MOD  project  is  primarily  concerned  with  just 
this:  host -parasite*  relationships.  But  there  is  a  vast  arena  in  which 
these  relationships  develop,  an  arena  which  contributes  in  a  very  important 
way  to  these  relationships.  Thus  we  must  consider,  but  go  well  beyond 


*The  term  parasite  is  used  here  in  its  broadest  sense  to  include  all 
living  agents  that  live  in  or  on  a  "host",  deriving  benefit  from  the 
host  —  and  these  agents  include  viruses,  bacteria,  spirochetes,  yeasts 
and  fungi,  as  well  as  parasitic  agents  (in  the  narrow  sense). 
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factors  which  directly  concern  just  the  host  or  just  the  parasite.  Some 
of  these  additional  factors  affect  the  interface  between  the  potential  host 
and  potential  parasite  (relating  to  the  critical  act  of  infection,  per  se) , 
others  affect  the  host’s  response  to  the  parasite  —  and  vice  versa  — 
(relating  to  t'ne  disease,  per  se) . 

Factors  primarily  concerned  with  the  host  have  been  described  as 
follows  (Hopps,  H.C.,  Principles  of  Pathology,  p.  388,  2nd  ed.,  Appleton, 
Century,  Crofts,  New  York,  1964). 

Differences  among  individuals  are,  of  course ,  very  important 
in  determining  the  diseases  to  which  they  are  susceptible , 
and  their  reactions  to  the  diseases  once  they  contract  them. 

But  patterns  of  disease ,  involving  large  groups  of  people , 
is  a  very  different  matter ,  and  provides  a  quite  new  per¬ 
spective  in  our  study  of  disease .  We  cart  learn  much  of 
value  by  looking  into  the  reason  for  these  varied  patterns 
of  disease.  The  principal  factors  include:  (l)  Time ,  in 
world  history,  (2)  Age ,  i.e.,  time  in  the  life  of  the 
individual,  (3)  Race,  (4)  Sex,  (5)  Socio-economic 
conditions  and  customs ,  and  (6)  Geogr.rphic  location. 

Factors  primarily  concerned  with  the  parasite  are  more  numerous  and 
more  complex  because,  in  addition  to  involving  many  physical  and  chemical 
aspects  of  the  environment,  the  parasites  may  be  dependent  upon  Inter¬ 
mediate  hosts  and/or  insect  vectors  to  complete  their  life  cycle,  and  may 
also  be  dependent  upon  animal  reservoirs  as  a  means  (either  direct  or  in¬ 
direct)  of  reaching  their  definitive  host.  (For  those  who  are  not  bio- 
medically  oriented,  it  may  be  advantageous  at  this  point  to  get  some  basic 
orientation  by  reading  the  brief  discussions  of  leptospirosis  and  schisto¬ 
somiasis  that  are  to  be  found  in  the  Appendix.) 

Taking  the  host,  the  disease  agent,  and  THE  ENVIRONMENT  all  together, 
one  has  a  complex  relationship  that  is  properly  termed  the  ecology  of 
disease . 
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An  understanding  of  disease  ecology  is  essential  if  one  is  to  compre¬ 
hend  the  reasons  why  disease  is  such  a  varying  entity  —  why  tne  "same" 
disease  (in  terms  of  its  etiologic  agent)  can  affect  an  occasional  person 
under  some  conditions  and  whole  populations  under  others;  why  sometimes  it 
may  be  so  mird  as  to  escape  detection  and  other  times  rapidly  fatal, 

Folke  Henschen's  description  of  the  dynamic  character  of  disease  is 
quite  appropriate  to  this  discussion  (The  history  and  Geography  of  Diseases, 
p.  1  ((Introduction)),  Delacorte  Press,  New  York,  1966). 


Diseases  are  not  unchanging  phenomena.  Their  appearance 
and  character  are  subject  to  historical  development  and 
varying  geographical  and  demographiaal  conditions  of 
population.  Some  diseases  seem  able  to  disappear;  other 
new  ones  to  appear.  Infectious  diseases ,  lich  ordy  one 

or  two  generations  ago  fanned  the  largest  group  in  our 
statistics  on  morbidity  and  mortality ,  have  been  driven 
back  by  the  advance  of  medicine .  Instead  two  other 
groups  of  diseases,  cardio-v oscular  diseases  and  tumours, 
have  taken  the  first  place,  a  development  which  is  partly 
connected  with  a  rising  average  expectation  of  life. 
However,  this  revolution  affects,  above  all ,  North 
America  and  many  of  the  coun ,ries  of  Surope.  But  even  in 
those  countries  whose  populations  form  the  greed  majority 
of  the  world's  inhabitants,  a  development  in  the  same- 
direction  has  occurred.  The  overall  picture  of  diseases 
within  one  country  or  community,  which  one  can  call  the 
’disease-panorama' varies  f‘en  from,  time  to  time,  from 
country  to  country,  and  from  tow>i  to  town. 


1.2  OBJECTIVES  OF  THE  MOD  PROJECT 


Objectives  are  discussed  under  five  subheadings,  beginning  with  a 
definition  of  goals. 
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1.2.1  DEFINITION  OF  GOALS 

Ac  the  37th  session  of  the  Executive  Board  of  the  World  Health 
Organization  (January  1966)  Professor  torray  Eden  gave  an  important  "State¬ 
ment  of  Communications  Science"  in  the  course  of  which  he  made  the  following 
assertions; 

•  That  me  dicin'-  and  biology  are  concerned  with  the  observation 
of  patterns  which  in  principle  can  be  made  precise  only  with 
the  help  of  mathematics. 

*  Tfiat  computers  and  computer  science  can  open  up  an  entirely 
new  way  of  studying  health  problems. 

«...  that  there  is  a  language  barrier  between  the 
communications  scientists  on  the  one  side  and  the  physicians , 
biologists  and  health  administrators  on  the  other ;  a  barrier 
which  must  be  breached  if  these  new  techniques  are  to  work 
together  with  medical  science  (and  they  must  work  together ) 
for  the  betterment  of  the  health  of  people. 

The  MOD  project  was  undertaken  (  n  1964)  with  precisely  these  ide  s  in 

mind . 

The  Computerized  Mapping  of  Disease  Project  (MOD)  has  two  principal 
objectives.  The  first,  and  most  important,  is  to  develop  a  system  that  will 
provide  for : 

(1)  Recording,  classifying,  collating,  and  validating  a 
wide  variety  of  medical-environmental  data 

(2)  Preprocessing  the  medical-environmental  data  so  that  ir. 
can  be  computer  processed.* 


*  One  of  the  major  problems  is  to  structure  a  data  analysis  vocabulary, 
developing  a  heirarchial  system  lor  the  qualitative  and  quantitative 
characterization  of  disease/ecologic  information.  This  requires  cutting 
across  disciplinary  boundaries,  identifying  the  "common  denominator"  of  the 
various  jargons,  and  converting  the  narrative  and  tabular  data  into  a 
miscible  form.  (This  is  quite  different  from  developing  a  dictionar1  or 
thesaurus. ) 
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(3)  Development  of  a  storage/retrieval  mechanism  to  act 
upon  such  preprocessed  medical  data,  together  with  a 
complex  editing  program  that  will  allow  updating,  that 
will  provide  for  immediate  identification  of  material 
in  conflict,  and  that  will  print  out  specific  data 
sources. 

(4)  Development  of  programs  that  will  allow  manipulation 
of  the  data  to  show  significant  interrelationships. 

(5)  Development  of  programs  whereby  the  computer  can 
"instruct"  a  plotter  to  prepare  contour  maps  reflecting 
quantitative  aspects  of  incidence/prevalence  specific 
diseases,  together  with  distribution  of  a  wide  variety 
of  causally  related  factors,  e.g.,  climatic  factors, 
soil  factors,  animal  reservoirs  and/or  insect  vectors, 
characteristics  of  the  human  population,  etc.,  etc. 

(6)  Development  of  programs  whereby  supporting  information 
(to  accompany  the  maps)  can  be  printed  out,  extending 
the  usefulness  of  the  mapped  medical  information. 

(7)  Development  of  programs  whereby  other  types  of  graphic 
display  can  be  generated  to  show  cause/effect  relation¬ 
ships  (e.g.,  line  graphs)  pertaining  to  prevalence 
and/or  incidence  of  a  given  disease. 

The  second  objective  of  the  MOD  project  is  to  produce  meaningful  maps 
(and  other  graphic  displays)  that  show  the  distribution  of  a  disease(s)  In 
terms  of  prevalence,  incidence,  severity,  etc.,  along  with  distributions  of 
selected  causally  related  factors.  Quantitative  as  well  as  qualitative 
aspects  will  be  considered,  with  major  emphasis  on  contour-type  maps,  the 
contrxir  lines  representing  isarithae. 
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By  using  a  computerized  system  of  analysis  and  output,  it  will  be 
possible  to  produce  distribution  mapc  in  a  matter  of  minutes  rather  tnan 
months,  as  has  previously  been  the  case.  This  will  allow  up-dating  whenever 
required.  Furthermore,  such  a  system  will  permit  the  production  of  many  more 
maps  than  would  otherwise  be  practical,  covering  a  wide  range  of  ecologic 
factors.  As  desired,  these  could  be  printed  on  transparent  stock,  nuitable 
for  overlay  assembly  in  order  to  compare  one  pattern  of  distribution  with 
another,  etc.*  In  the  past,  the  time  involved  in  preparing  disease  distri¬ 
bution  maps  has  been  prohibitive  in  terms  of  maintaining  current  information. 
This  is  reflected  by  the  fact  tnat,  to  date,  there  have  been  only  two  major 
contributions  in  this  field: 

A  Geographic  Atlas  of  Disease  prepared  by  the  American 
Geographical  Society  and  published  during  1950-55. 

A  World-Atlas  of  Epidemic  Diseases  edited  by 
Professor  Ernst  Rod-enwaldt  (Heidelberg)  and  published  in 
1952  (but  reflecting  data  gathered  some  years  before). 

From  a  broader  point  of  view,  the  HOD  Project  (Happing  Of  Disease)  is 
an  effort  to  illuminate  the  geoyi  tv  hie  pathology  of  disease.  Geographic 
pathology  is,  in  a  sense,  a  kind  of  comparative  pat  ology  —  one  in  which 
place  (rather  than  species)  is  the  primary  variable.  Geographic  pathology 
attempts  to  answer  the  questions:  What  (disease);  Whe re  (is  it)  —  and 
When ;  and  Why  (is  it  there).  Of  couige  geographic  pathologv  includes  aspects 
of  epidemiology  since  it,  a'so,  is  concerned  with  prevalence  and  incidence 
and  the  interplay  among  complex  causal  factors,  but  it  goes  beyond  epidem¬ 
iology  in  its  Loncen.  for  the  pathogenesis  and  the  pa'nologic  effects  of  the 
disease  under  study. 


*  There  is  virtually  no  computer  limitation  of  map  scale;  as  the  geographic 
area  to  be  covered  decreases  (the  size  of  the  map  remaining  constant)  the 
map  scale  varies  inversely  and  "resolution”  increases. 
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Potential  Applications  of  the  HOD  system,  in  addition  to  producing 
disease  distribution  maps,  per  se  (and  oi  or  graphic  displays),  and  in  help¬ 
ing  to  determine  cause!  ••eiationships .  include:  (a)  use  in  evaluating  proba¬ 
ble  (disease)  consequences  from  particular  changes  in  ecology,  and  (b)  use 
in  developing  mathematical  models  by  which  one  may  predict  major  changes  in 
disease  incidence,  e.g.,  epidemics. 

Although  _he  MOD  system  has  been  developed  with  -Ternary  concern  for 
infectious  diseases,  the  system  is  applicable  to  virtuallv  any  problem  area 
in  which  "things"  need  be  cons±  '  :red  in  a  time/location  context. 

Summarizing,  the  MOD  project  is  an  effort  to: 

(1)  characterize  input  data  (relating  to  disease/ 
environment)  in  such  a  way  that  they  can  be  stored 
and  readily  retrieved  in  context  by  a  computerized 
system  which,  (2)  using  these  data,  can  relatc- 
meaningf ullv ,  pr^valence/incidence/character  of 
disease  to  a  variety  of  direct  and  indirecf  causal 
factors,  and  (3)  output  the  inf ormat ion  dlrectlv  in 
map  form. 

1.2.2  MAPS  AS  A  MEANS  OF  DISPLAYING  INFORMATION 

Maps  were  chosen  as  the  principal  pattern-form  to  display  Information 
pertaining  to  disease  (but  not  the  only  means)  for  two  reasons:  First ,  be¬ 
cause  those  areas  in  which  the  distribution  of  disease  agent  and  host  over¬ 
lap  mar*,  the  geographic  regions  where  the  disease  can  occur;  evaluation  of 
such  ecologic  factors  as  temperature,  rainfall,  humidity,  the  anv_nt  and 
mineral  content  and  pH  of  surface  water,  .agricultural  practices,  population 
densities  of  various  plants  and  animals  (including  man),  the  kinds  of  people 
involved  foot  only  age  and  sex,  but  race,  etnnie  group,  and  tribe)  and  their 
customs  --  and  a  hundred  otner  factors  closely  tied  to  geographic  location  -- 
can  help  us  to  determine  where  the  disease  will  ^ccur,  and  how  it  will  be 
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manifest.  Second ,  because  maps  have  a  unique  advantage  over  most  other  forms 
of  graphic  display  for  the  general  reasons  that: 

(1)  Extensive  and  continual  usage  of  map  forms,  beginning  in 
early  childhood,  has  conditioned  most  (educated)  people 
to  an  intuitive  understanding  of  maps. 

(2)  The  map  is  ideally  suited  to  a  consideration  of  multiple 
factors  simultaneously  (e.g.,  place  —  both  geographic 
and  political  —  in  relation  to  topography,  population 
density,  the  location  of  towns  and  cities,  the  location 
and  character  of  transportation  routes,  and  time  zones). 

(3)  Through  the  use  of  rather  simple  devices  such  as  isarithms 
(isopleths)  one  can  achieve  a  three  dimensional  effect 

in  a  two  dimensional  presentation  (quantity  becomes  the 
third  dimension;  quality  and  location  the  other  two). 


We  believe  that  a  mechanism/system  which  can  product  many  kinds  of  nap- 
patt.  ns  quickly,  in  response  to  specific  query,  will  offer  two  very  important 
advantages:  First ,  such  a  mechanism  will  make  it  possible  to  have  current 
information  about  the  distribution  of  specific  diseases  and  the  distribution 
of  knoun  causally  related  agents  or  conditions.  Second ,  the  rapid  availa¬ 
bility  of  a  large  number  and  wide  variety  of  disease-environmental  maps  will 
give  the  observer  an  opportunity  to  compare  location  patterns  of  unknown  but 
pocoJily  related  ecologic  factors  and,  in  this  way,  help  him  to  identify 
causal  relationships  that  might  c  nerwise  have  escaped  notice. 

Two  recent  articles  in  Nature  describe  the  present  situation  well  in 
terms  of  needs  and  accomplishments  in  the  field  of  automatic  data  processing/ 
computer  mapping,  and  these  comments  are  very  pertinent  to  the  MOD  project. 

25  March  1967  — ".  .  .  only  a  tiny  proportion  of  the  maos 
of  demographic  and  climatic  information  collected  by 
governments  ever  sees  print  in  map  form.  Information  is 
simply  tabulated  by  area,  and  the  possibility  of  spotting 
regularities  or  correlations  --  say  the  incidence  of 
pellagra ,  family  expenditure  on  food ,  and  the  provision 
of  medical  services  in  the  eastern  United  States  —  is 
very  remote  indeed.  Such  interesting  relations  as  have 
been  found  are  the  result  of  year's  of  searching,  and 
merely  increase  the  sense  of  frustration  that  there  is  no 
better  way  of  doing  it. " 
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lb  April  1967  !,In  the  environmental  sciences,  a 
shortage  of  information  has  been  turned  into  a  flood 
by  advances  suck  as  automatic  data  loggers ,  vhoto- 
gr-rohic  siervey  and  satellite  instrumentation,  but 

methods  of  ..sing  this  information  have  been  left  j 

behind.  Maps  are  the  best  way  of  making  the  en-  j 

v ironmental  information  comprehensible ,  but  they  j 

are  still  prepared  by  slow,  expensive  arid  in-  I 

flexible-  process.  Professor  Linton  of  the  Depart-  | 

ment  of  Geography  at  the  University  of  Birmingham  j 

described  the  process  bluntly  as  "cottage  industry" .  | 

So  far unfortunately,  it  has  not  proved  possible  i 

vo  automate  the  process.  Automation  is  attractive  \ 

because  even  if  it  did  not  make  the  process  sim-  j 

p1er,  it  would  certainly  speed  it  up.  In  addi¬ 
tion  to  being  a  great  advantage  in  conventional  j 

topographic  mapping ,  the  increase  in  speed  would  j 

encourage  people  to  include  a  locational  ' lament 
in  ohse™  -1- ions  which  are  not  at  present  mapped  at 
all.  a  5  observations,  such  as  the  census  re¬ 
turns*  could  then  be.  made  avail-aide  in  map  form  as 
well  as  in  statistical  form . 

Maps  also  offer  a  powerful  research  tool  which 
many  people  believe  is  under- exploited  because  of 
the  difficulties  and  expense  in  preparing  maps  by 
hand ;  maps  showing  the  spread  of  diseases  can  be 
compared  with  those  showing  diet,  scoriomic  circum¬ 
stances,  or  the  availability  of  medical  services, 
and  reveal  une-xpected  correlations  —  lead  in  the 
soil  and  the  incidence  of  mental  disease ,  for 

examp le .  ’ 

i 

\ 

1.2.3  SELECTION  OF  DISEASES  TO  STUDY 

! 

I 

The  major  objective  of  the  MOl)  project  Is  to  develop  a  method  that  j 

will  give  new  insight  into  disease-environmental  relationships,  but  this  j 

cannot  be  done  effectively  without  reference  to  real  life  situations.  Dis¬ 
ease  models  are  necessary,  not  only  to  develop  methods,  but  to  test  them. 

i 

Our  choice  of  diseases  to  study  was  governed  by  the  following  consider  a-  j 

i 

t.ions: 
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(1)  There  just  be  adequate  inf  urination  about  the 
disease  in  terms  of  amount*  and  qualitative/ 
quantitative  factors,  and  it  must  be  possible 
to  connect  both  of  these  items  to  geographic 
loci . 

(2)  The  disease  must  be  one  in  which  geographic 
distribution  reflects  ecologic  requirements. 

(3)  The  dii-uase  should  be  one  in  which  the  prevalence 
and/or  incidence  is  related  to  certain  variables 

in  a  way  that  is  understood  (at  least  partially)  — 
and  the  variables  must  be  measurable  and  of  known 
geographic  distribution. t 


Selection  of  the  disease  area  to  study  was  influenced  by  these  con¬ 
siderations  : 

(1)  The  area  should  be  fairly  large  (to  test  the  graphic 
generator  programs) . 

(2)  There  should  be  several  hierarchal  levels  of  political 
units  in  the  area  (e.g.,  country,  state  and  county). 

(3)  Disease-environmental  data  snouid  be  rather  uniformly 
distributed  over  the  area,  i,e.,  there  should  be 

few  "unknown"  or  "unreported"  subsections  within  the 
area. 


Of  many  many  possible  diseases  for  study,  based  upon  the  considerations 
given,  we  have  concentrated  upon  leptospirosis  and,  to  less  extent,  schisto¬ 
somiasis  and  rabies.  These  were  carefully  chosen  for  several  reasons: 


*  The  disease  oust  be  positively  diagnosable  and  reported.  Furthermore, 
ejects  of  the  disease  should  persist  for  a  long  while  unless  the 
disease  is  readily  apparent  in  its  acute  form  and  continually 
searched  for. 

t  Man  to  man  transmitted  diseases  should  be  avoided  since 
occurrence  is  primarily  influenced  by  intimacy  of  contact 
and  "immunity".  This  excludes  most  upper  respiratory 
infections . 
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(a)  they  ar“  Important  diseases;  (b)  we  (in  the  Geographic  Pathology  Divi¬ 
sion)  know  a  good  deal  about  them;  (c)  highly  reliable  laooratory  diagnosis 
is  possible;  (d)  they  are  wide-spread  in  distribution,  but  not  diffuse;  (e) 
more  reliable  distribution  maps  are  badly  needed;  (f)  each  of  the  diseases 
poses  specific  data  processing  challenges  in  relation  to  important  ecologic 
factors.  For  example: 

Leptospirosis:  •Involves  many  mammalian  reservoirs  (lOOt), 
both  domestic  and  wild. 

•  Occurs  throughout  most  of  the  world. 

•  Prevalence  is  greatly  influenced  by  the  amount 
and  nature  of  surface  water  In  the  area,  in¬ 
cluding  pH,  mineral  content,  rate  of 

evapo. rtion,  etc. 

•Prevalence  is  greatly  influenced  by 
occupational  and/or  recreational  habits 
of  human  beings. 

•Severity  varies  markedly,  depending  upon 
serotype  and  many  other  factors. 

1.2.4  THE  DATA  BASE 

There  are  three  basic  parts  to  the  MOD  system  and  these  intimately 
relate  to  each  other  in  the  sequence  shown  below. 

Locate  and  get  data  sources 

Select/extract/ r ormat  to  produce  a  DATA  BASE  FILE 

Design/ Implement  Software/hardware  system 

(giving  special  considera¬ 
tion  to  form  of  output) 

The  data  base  is,  obviously,  an  essential  (key)  ingredient  of  the 
System  since  it  provides  the  substance  upon  which  the  software/hardware 
components  act. 
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But  (as  shown  previously)  there  are  two  aspects  to  the  data  base: 

(a)  collecting  the  raw  data  and  (b)  preprocessing  this  so  that  it  can  be. 
effectively  stored,  retrieved,  manipulated,  and  output  as  information. 

The  difficulties  in  getting  the  raw  data  are  very  considerable,  and  should 
not  be  minimized.  But  the  greatest  problem  that  the  bio-medically  orieited 
person  encounters  when  he  attempts  to  use  a  computerized  data  processing 
system  is  in  preparing  (preprocessing)  the  medical-environmental  data  for 
computer  Input. 

This  problem  arises  because  the  bio-medical  terms  are  not  direct  and 
simple;  they  do  not  have  clearly  defined  single  quality/quantity  character- 
istics.  This  preprocessing  phase  of  the  procedure  is  aptly  described  by 
the  word  translation  —  if  one  understands  that  we  are  not  talking  about 
translating  a  foreign  language  into  English,  or  one  machine  language  into 
another,  etc.  Translation,  in  this  particular  sense,  means  the  reduction 
of  complex  data  terms  to  bits  which  have  a  common  denominator,  so  to  speak, 
and  are  thus  compatible  with  all  the  other  bits  of  data  which  bear  re¬ 
lationship.  One  can  combine  oranges,  and  grapefruit,  and  pineapple,  and 
coconut  —  and  get  ambrosia  —  but  there  is  a  limit.  One  cannot  combine, 
meaningfully,  oranges, and  minutes,  and  square  miles,  and  altitude,  as  such. 
But,  given  a  proper  data  structuring  method,  one  can  combine,  meaningfully, 
orange  trees,  time  (in  a  calendar  sense),  square  u._lcs  (in  a  location 
sense),  and  altitude  —  along  with  temperature,  rainfall,  human  population 
(the  availability  of  agricultural  laborers,  the  numbers  of  people  who  drink 
orange  juice,  etc.)  and  the  prevalence  and  incidence  of  carotenemia  —  and 
produce  distribution  maps  which  show  important  interrelationships  among 
these  i _ems . 

A  simpler,  and  more  common  problem  in  translation  comes  when  we  must 
make  compatible  two  statements  such  as:  (1)  "In  the  early  winter  of  19o4 
there  occurred  a  mild  epidemic  of  influenza  among  the  urban  population  of 
northwestern  United  States,"  and  (2)  "During  the  period  6-29  Noa  :mber  1964, 
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the  prevalence  of  type-A  influenza  infection  was  47X  greater  than  for  a 
similar  period  in  1963  amcng  the  inhabitants  of  cities  with  populations 
over  33,000  in  Washington,  Oregon,  Idaho,  and  western  Montana."  Both 
statements  are,  obviously,  very  closely  related,  and  we  have  no  difficulty, 
subconsciously,  in  fitting  them  together.  But  without  a  precise  system  for 
translation,  these  two  dat.a-complexes  would  be  "considered"  by  a  computer 
to  be  entirely  unrelated. 

The  problems  of  preprocessing  data  are  r  xuier  complicated  in  con¬ 
nection  with  the  MOB  system  because,  as  we  have  mentioned  before,  any  broad 
consideration  of  disease- environmental  relationships  must  utilize  data 
drawn  from  a  group  of  different  disciplines  each  of  whxch,  in  a  sense,  has 
its  own  scientific  language  —  geography,  geology,  agronomy,  political 
economy,  cultural  anthropology,  pathology,  etc.  etc.,  but  this  aspect  of 
the  problem,  with  other  aspects,  is  treated  extensively  in  Section  4., 

Data  Characteristics. 

1.2.5  FiRDWARE/' SOFTWARE  CONSIDERATIONS 

Speaking  about  multifactorial  etiology  of  disease,  Profes.  A.  Payne 
in  his  "Statement  on  Epidemiology"  to  the  Executive  Board  cr  WHO  (20  January 
1966)  said; 

"These  changed  concepts  and  increased  complexities  require  the 
development  of  new  theoretical  and  analytical  approaches.  We 
can  no  longer  be  content  with  the  solution  of  simple  situations 
such  as  one  agent  of  known  infectivity ,  incubation  period ,  eta., 
in  a  population  of  known  density  and  immune  status.  Mathema¬ 
tical  models >  which  involve  the  translation  of  real  world  prob¬ 
lems  into  symbols  and  numbers ,  already  exist  which  enai*le  us  to 
predict ,  within  reasonable  limits ,  the  outcome  of  the  intro¬ 
duction  of  an  agent  into  such  a  situation . " 

" The  new  concepts  demand  the  formulation  of  models  nany  times 
more  complex  and  require  both  highly  sophisticated  mathematical 
treatment  and  advanced  computer  technology .  Complex  data  of  this 
kind  cannot  be  handled  in  any  other  way ,  and  formulation  of  new 
models  demands  the  aid  of  mathematicians  and  computer  scientists.  " 
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We  agrae  entirely  with  Professor  Payne,  and  it  was  this  point  of  view 
that  led  us  to  undertake  the  MOD  project  nearly  three  years  ago. 

The  MOD  effort  has  been  directed  by  disease-oriented  rather  than  com¬ 
puter-oriented  persons  because,  from  the  outset,  ve  realized  that  computer 
processing  was  the  means  to  an  end.  We  have  worked  diligently  so  that  the 
criticism  leveled  by  Dr.  A  Feinstein  at  (some)  computer  technologists  would 
not  be  appropriate  for  us:  "They  may  understand  the  machine,  but  not  the 
problem;  mathematical  theory,  but  not  the  nature  of  the  problem  —  the 
statistics  may  be  excellent,  but  irrelevant."  These  statements  are  not 
meant  to  demean  the  role  of  the  computer  nor  the  systems  analysts,  prograaers, 
etc.  Sections  6  and  7  of  this  volume  give  clear  evidence  that  we  do  appreci¬ 
ate  the  importance  (and  ,'omplexity)  of  computer  technology. 

Continuing  with  our  very  general  consideration  of  computer  processing 

it  is  appropriate  to  point  out  that  automation  does  not  make  a  process 

simpler,  it  simply  speeds  it  up.  From  a  practical  viewpoint,  however,  the 

great  speed  of  operation  allows  manipulation  of  a  volume  of  data  that  would 

otherwise  be  virtually  imposcible  to  handle.  In  relation  to  the  MID  system* s 

30 

(ultimate)  requirements  we  have  estimated  _hat  on  the  order  of  10  possible 
factors  may  need  to  be  considered  (not  necessarily  used)! 

In  addition  to  speed  of  operation,  automation  has  another  very  im¬ 
portant  asset.  It  provides  for  a  consistency  of  handling  data  that,  other¬ 
wise,  would  not  be  attained.  In  turn,  this  consistency  of  handling  forces 
a  clear  and  sharp  characterization  of  the  data  input,  the  query,  and  the 
Information  output. 

There  are  many  conventional  aspects  to  the  hardware/software  require¬ 
ments  of  the  MOD  system,  but  there  are  two  unconventional  aspects.  The 
first  relates  to  che  storage/retrieval  and  manipulation  of  uniquely  struc¬ 
tured  input  data  that  deal  with  medical-environmental  situations  —  processes 
which  include  a  com;  Lex  Dictionary  File  that  recognizes  errors  and  allows 
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MAPPING  OF  DISEASE 

for  their  correction,  that  provides  for  updating,  and  that  also  peforms  a 
gazeteer  function.  The  second  has  to  do  with  developing  computer  programs 
to  contour-plot  disease-environmental  data  —  data  that  are  often  represented 
by  relatively  sparse  data  points. 

*  *  * 

In  a  sense,  the  hardware/software  ijs  the  computer,  and  the  computer 
is  the  milieu  in  which  the  raw  (input)  material  is  converted  to  the  finished 
(output)  product.  The  figure  below  illustrates  this  relationship  and  also 
points  up  the.  fact  that  one  important  interface  exists  between  the  input 
group  and  the  computer,  and  another  between  the  user  group  and  the  computer. 


INPUT  gr  oup 
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The  two  levels  of  interface 


corrected  for  population, 
for  1965,  compared 
urith  1966  acid  1967. 


16 


lists 

tables 

block  diagrams 
maps 


cultural  anthropologic 
(inci.  sociologic] 


i 

| 

^  Technical  summary 


ABSTRACT  -  This  section  is  a  general 
consideration  of  what  has  been  ac¬ 
complished  during  the  approximately 
30  months  that  the  MOD  project  has 
been  active,  supplementirig  and  com- 
p lemming  the  ( prospective )  view 
presented  in  the  Intrckluction ,  and 
the  General  Summary,  Conclusion i,  zid 
Recommendations ,  wh.ch  comprise 
Section  0.  In  addition,  it  exvlains 
the  organization  of  this  book . 


In  every  problem  situation  we  ar.  faced 
wi  ii  the  twin  questions: 

«  What  do  we  know? 

♦  What  do  we  not  know? 

When  wt  answer  these  questions,  then 
we  are  ready  to  make  progress  in 
understanding ;  when  we  have  gained 
under standing,  then  we  are  prepared  to 
make  decisions. 
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Section  1,  the  Introduction,  considers  the  MOD  project  as  a  whole  and 
summarizes  many  aspects.  Furthermore,  each  section  cor  tains  its  ovn  ab¬ 
stract.  These  are  reasons  why  this  technical  summar,  is  Quite  brief. 

The  Geographic  Distribution  of  Infectious  Disease  Project  i^  character 
ized  by  its  unique  conceptual  framework  and  the  variety  of  scientific  disci¬ 
plines  it  represents:  geography,  cartography,  geology,  meteorology , 
agronomy,  biology,  medicine,  political  economy,  cultural  anthropology, 
systems  analysis,  computer  systems  design,  computer  programing,  etc. 

Because  of  iMs  multidisciplinary  involvement,  certain  fundamental 
questions  core:  it  mg  the  output  desired  from  the  system  and  the  potential 
users  of  that  product  had  to  be  resolved  before  beginning  system  design. 

Useful  types  of  oueput  were  determined  to  be  in  the  form  of  maps, 
graphs,  and  narrative  reports.  Maps  were  selected  as  the  principal  device 
to  display  areal  relationships,  and  were  investigated  in  detail  and  defined: 
both  conventional  maps  and  special  maps,  suited  to  the  study  of  geographic 
distribution  of  disease-environmental  aata. 

After  determining  the  output  desired  from  a  geographic  ciseuse- 
environmentai  data  system,  input  .uit-u  o'UU’UCtO injtdos  were  considered,  It 
was  necessary  to  develop  data-structurlng  terrain  ^Logv  before  further  progres 
could  be  made,  and  this  was  accomplished.  MOD*  data  requirements  were 
then  compiled  in  the  for”  of  a  catalogue  of  disease  and  envl ’  nment.il  lac- 
tors,  and  both  minimal  and  idea,,  data  needs  were  outlined.  Data  sources 


*  The  term,  MOD,  used  throughout  this 
the  initial  letters  of  the  three  words 
of  the  project:  Mapping  oi  Disease. 


report  is  an  aernpvm  Derived  fro. 
that  cempr ise  the  abbreviated  name 
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were  considered  and  the  many  problems  posed  by  particular  characteristics 
of  data  available  for  use  in  the  MOD  system  were  analyzed 

'T'hc  combination  of  input  ana  output  requirements  fur  the  MOD  svstem 
computerized  techniques  for  solution.1  In  considering  these  tech- 
niqu  ,  the  requirements  for  output  devices,  input  devices,  and  central 
processing  units  were  determined.  Output  devices  are  a  primary  considera¬ 
tion  since  the  system  design  is  based  upon  the  required  output.  Cathode- 
ray-tube  (CRT)  devices,  digital  plotters,  and  line-printers  were  all  in¬ 
vestigated  in  terms  of  MOD  system  use.  Although  input  devices  are  a 
secondary  consideration,  optical  character  recognition  (OCR)  machines, 
punched  paper  tape  readers,  card  (or  tape  card  image)  readers,  and  digi¬ 
tizers  were  all  analyzed  with  respect  to  possible  use  in  the  system.  Data 
processing  methods  must  be  considered  in  any  discussion  of  computer  systems, 
and  the  basic  data  processing  techniques  and  commonly-used  languages  were 
inve-  i gated.  Detailed  conclusions  '*clar  to  the  integration  of  computer 
equipment  and  data  processing  methods  v.'c  resented  in  thp  appropriate 
sections  (especiaxJy  Section  6). 

The  system  design  specifications  for  the  MOD  system  have  been  pre¬ 
pared  in  varying  levels  of  detail.  Included  in  this  report  (Section  7) 
are  discussions  of  the  four  subsystems  comprising  the  MOD  system:  s',.  qr~;tc . 
retrieval,  synthesis  .  and  output . 

k  k  k 

Throughout  this  report  many  figures  are  presented  to  illustrate  prob¬ 
lems  encountered  in  designing  the  MOD  system  —  and  cur  effectiveness  in 
overcoming  these  problems.  Actions  speak  louder  than  words,  and  we  be¬ 
lieve  that  the  computer  simulated  manually  drawn  maps  and  the  computer/line- 
printer  and  /plotter  produced  maps  showing  disease-environmental  data  speak 
loudly  in  support  of  our  conclusions  that:  (1)  the  computerized  mapping 
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oi  .isease-environmental  data  is  feasible;  (2)  the  MOD  Bystem  design  we 
have  described  here  represents  an  effective  "blue-print";  and  (3)  this 
system  should  be  implemented  as  soon  as  possible  because  there  is  a  wide¬ 
spread  and  pressing  need  for  the  type  of  data  processing  and  output  that 
th  MOD  system  would  produce. 

2.2  METHOD  OF  1 PPROACH 

The  reader  is  due  some  explanation  of  why  this  report  was  organized 
as  it  is  presented  here.  The  method  of  our  approach  to  the  problems  of 
designing  the  MOD  system  had  a  major  influence  on  the  way  we  have  ap¬ 
proached  this  account  of  our  activities. 

The  Preface  and  Introduction  set  th--  stage ,  so  to  speak,  by  de¬ 
scribing  the  objectives  of  the  project  and  considering,  in  general,  ways 
and  means  oi  attacking  the  many  problems. 

Since  the  data  are  of  primary  consideration,  Data  characteristics 
and.  Data  collection  are  considered  before  Computer  system  requirements 
(an  essential  factor  in  system  analysis)  and  Date  processing  (a  descrip¬ 
tion  of  system  design). 

After  this  background,  the  Section,  Output  usage,  describes  opera¬ 
tional  procedures  and  potential  applications  of  the  system. 

Finally,  a  General  summary,  conclusions  and  recommendations  are 

given. 

The  Appendix  presents  a  variety  of  useful  information  to  supplement 
that  contained  in  the  major  portion  of  the  report. 
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ABSTRACT  -  This  section  considers, 
in  detail,  the  types  of  ou  t  re¬ 
quired  of  the  MOD  system.  Since  maps 
cf  disease- environmental  data  are  of 
major  concern.,  the  various  types  of 
maps  are  explained  —  and  hoio  to 
construct  them.  Block  diagixmis  and 
graphs  are  also  discussed  in  the  con¬ 
text  of  disease-environmental  relation- 
sr ips . 
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"The  true  purpose  of  knowledge  resides 
in  the  cou sequences  of  directed  action." 

John  Dewey 
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3,0  GENERAL  CONSIDERATIONS 

The  most  critical  part  in  developing  any  automated  data  processing 
system  is  the  determination  of  precisely  what  the  output  (result)  should  be. 
This  is  necessarily  so  because  output  requirements  directly  influence  virtu- 
ally  every  step  in.  development  of  the  system.  This  section  gives  a  compre¬ 
hensive  evaluation  of  those  output  considerations  which  were  ttn  '-’asis  for 
designing  the  MOD  system,  considers  also  the  other  side  of  ne  coin  — 
input,  Iri  a  sense,  input  is  the  cloth  from  which  the  garment  is  made; 
system  design  is  the  pattern  to  which  it  is  cut  —  and  the  computer  is  the 
sewing  machine. 

Since  output  is  of  no  value  unless  it  is  put  to  good  use,  any  con¬ 
sideration  of  output  is  sterile  unless  the  potential  user  is  also  considered. 
For  convenience,  output  usage  is  considered  in  detail  in  Section  8  (after 
the  entire  system  has  been  discussed),  but  output  usage,  as  we  conceived  it, 
served  as  a  constant  guide  in  output  analysis.  Obviously,  the  output  of 
the  MOD  system  is  information,  information  directed  primarily  toward: 

•  Presenting  quantitative  aspects  of  disease-environmental 
data  In  relation  to  place  and  time. 

•  Identifying  the  multiple  causal  factors  in  a  given 
disease,  and  their  interrelationships. 

•  Determining  interrelationships,  if  any.  among  several 
different  diseases  occurring  together,  e.g.,  schisto¬ 
somiasis,  Iron  deficiency,  protein  malnutrition  and 
tuberculosis . 

•  Evaluating  the  impact  of  the  disease  upon  socio¬ 
economic  aspects  of  the  area,  military  operations, 
etc. ,  etc. 

•  Anticipating  the  effects  of  altered  ecology  on 
incidence  and  manifestation  of  disease. 

•  Predicting  variations  in  incidence  which  are  likely 
to  occur  in  the  foreseeable  future  —  on  the  basis 
of  past  history  and  trend  analysis. 
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A  system  contain,  g  generalized  disease  informatic  could  provide  out¬ 
put  than  would  satisfy  economic-  or  cultural  developmt  i.-orier.ted  people 
who  were  si  empting  to  assess  the  influence  of  specific  diseases  upon  the 
development  of  a  particular  country  or  society  (e.g.,  AID).  Such  a  system 
would  also  serve  an  industrial  group  planning  to  establish  a  base  of  oper¬ 
ations,  guiding  them  in  immunization  and  other  prophylactic  measures,  in  the 
design  of  medical  facilities,  in  the  types  and  specific  locations  of  houses, 
dormitories,  etc.  etc.  In  other  words,  such  an  input/output  system  would  be 
particularly  useful  to  decision-makers  who  required  information  about 
geographically  oriented  disease  conditions  over  comparatively  broad  regions., 

A  system  containing  more  detailed  information,  on  the  other  hand,  would 
be  required  to  satisfy  biomedical  researchers  concerned  with  in-depth  in¬ 
vestigations  of  disease-environmental  situations,  particularly  cauaal 
relationships.  Similarly,  more  detailed  information  would  be  useful  to 
public  health  officials  whose  major  objective  was  surveillance  of  specific 
diseases . 

As  discussed  in  Section  4,  Data  Characte  ;tics  (4,4.1),  the  method  by 
wl ' 'h  the  data  is  structured  permits  it  to  be  entered  at  any  level  of  gener¬ 
ality,  and  to  be  retrieved  at  any  level  equal  to  or  more  general  than  that 
at  which  it  was  entered.  In  other  words,  highly  specific  data  can  be  used 
in  a  broad  or  general  way,  but  not  vice  versa.  Thus,  the  more  detailed  the 
data  input,  the  more  potential  users  could  be  satisfied.  Because  of  this 
the  MOD  system,  including  data  extraction  forms,  has  been  designed  to  re¬ 
ceive  and  process  data  in  its  most  detailed  form  (when  available)  —  detailed 
as  to  precise  geographic  location  as  well  as  to  specific  qualitative  and 
quantitative  characteristics 

3.1  TYPES  OF  OUTPUT  CONSIDERED 

The  information  to  be  output  by  the  proposed  MOD  system  "an  take  several 
forms.  For  purposes  of  this  discussion  these  are;  narrative  reports  (i.e., 
listings  or  tables),  graphs .  maps .  and  block  diagrams. 


3.1.1.  NA? NATIVE  AND  1 1EJLAR  REPORTS 

Today,  moat  computer  output  takes  the.  form  of  "hard-copy"  reports, 
i.e.,  printed  words  and  numbers  arranged  in  lists,  tables,  or  narrative-like 
prose.  The  techniques  for  producing  these  are  well  known  and  need  not  be  \ 

discussed  in  detail  here.  It  is  important  to  realize,  however,  that  a  com¬ 
puter  system  cannot,  ordinarily,  combine  data  stored  in  f^ee  prose  or  narra¬ 
tive  form  and  produce  summaries  o'  such  data;  but  it  can  summarize  rather 
rigidly  formatted  data  and  produce  meaningful  short  repot ts.  Although  the 
MOD  sys-  era  concentrates  on  output  in  the  form  of  maps,  narrative  and  tabu¬ 
lar  reports  are  also  an  imports- t  output  product  because  they  are  required 
to  displ. y  such  items  as  input  data,  the  contents  of  a  data  file,  data 
retrieved  by  queries,  data  to  be  used  in  generating  other  forms  of  output, 
etc. 

Figuie  3-1  shows  a  set  of  data  that  was  extracted  from  Malek  (in  May, 

1961),  and  'no  kind  :f  tabular  output  useful  in  studying  a  dis¬ 

ease  situation.  (The  project  team  added  longitude  and  latitude  coordinates 
to  the  data  and  rounded  the  data  values  to  make  thciia  more  easily  comparable.) 

TMg  set  of  data  was  used  as  a  standard  set  for  investigating  mapping  tech¬ 
niques,  and  it  appears  throug  out  this  report  in  various  forms.  We  emphasize 
that  these  particular  data  are  quite  limited  in  scope,  and  that  their  primary 
use  has  been  in  developing  methods  for  various  computerized  outputs  during 
design  of  r  he  MOD  system. 

If  for  each  geographic  locality  the  rat  population  density  were  plotted 
as  the  X  coordinate,  the  pH  value  of  the  surface  water  as  the  Y  coordinate, 
and  the  prevalence  of  leptospirosis  as  the  Z  coordinate,  then  a  contented 
graph  could  be  constructed,  like  that  in  Fig.  3-2A.  The  same  data  could  also 
be  plotted  as  a  family  of  curves  by  taking,  for  each  locality  where  the  rat 
population  density  was  a  parti  ml  a~  value,  the  water  pH  value  as  an  X  co¬ 
ordinate  and  the  leptospirosis  prevalence  number  as  a  ,  and  this  is  shown 
in  Fig.  3-2B . 
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DATA  POINTS  FUN  I NP t C T I  UN  HATE  (PERCENT,  OF  SCH 1 STOSOM I  AS  I  9  OUt  TO 
scmIstoso'a  MANSuM  IN  .man.  AS  GROUPED  BY  PHOV  NCtb  anc 

SMALL  COUNTRIES.  EXTRACTED  FRUr  „AlE<  in  MAY,  1961.  STUDIES  IN 
n,c,Acr  r^ui  ot.v  .  P.  3D5-313.  1  INTERPRETED  BY  MUD  STUDY  U AH. 


Figure  Z-l  Listing 
of  the  standard  set 
of  South  American 
schistosomiasis  data 
used  during  MOD  map¬ 
ping  studies  ("NR" 
means  not  reported) . 
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Prevalence  (%  infected)  of  Human  Leptospirosis 
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pH  of  Surface  Water  j 

Acidic  1  - . . . —  — . —  1 —  - - - -  ^  Basic 

Figure  3-2-B  A  family  of  two-variable  curves  from  'a  hypothetical 
leptospirosis  situation. 
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Mean  Total  Annual  Rainfall  (inches) 


Mwn  ToU!  Annuli  ftainfelt  (Inchts) 


?*&***  3-2-C  (upper)  A  two-variable  graph,  and  Z-2-D  (lower),  a  three- 
variable  (contoured)  graph,  both  baaed  upon  real  data:  schistosomiasis  in 

BraZi1,  (Temperature  am  rainfall  data  derived  from  material  in 

Atla8t  l2th  ed.  1964,  copyright  by  Rand 
McNally  &  Co.  R.L.  68  S  86;  used  with  permission. ) 
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Because  of  the  potential  power  of  these  graphing  methods  we  also 
experimented  with  some  actual  data.  The  standard  set  of  schistosomiasis 
data  provided  disease  factors  and  two  other  sets  of  factors  were  obtained 
from  existing,  readily  available  maps:  arm’- il  rainfall  (Rand  McNally,  1964, 
p.  97,  upper  left),  and  July  normal  temperature  (Rand  McNally,  1964,  p.  11, 
lower) .  Values  were  extracted  from  the  rainfall  and  temperature  maps  at  the 
same  longitude  and  latitude  points  as  each  disease  data  point  (i.e.,  at  the 
center  of  each  respective  province) ,  Figure  3-2C  shows  a  graph  of  two 
factors,  rainfall  and  infection  rate.  Two  possible  curves  have  been  fitted 
to  the  data  on  this  graph,  one,  an  exact  fit,  and  the  other,  a  smoothed  fit. 
Figure  3-2D  shows  a  graph  of  three  factors:  temperature,  rainfall,  and  in¬ 
fection  rate.  This  last  graph  was  contoured  to  show  the  possible  application 
of  this  graphing  technique.  We  emphasize  that  our  purpose  here  was  to  ex¬ 
plore  potentially  useful  methods;  data  limitations  do  not  allow  conclusions 
regarding  specific  disease  situations. 

The  MOD  study  team  did  not  concentrate  on  computerized  output  of  this 
kind  because  of  time  and  economic  constraints,  but  our  limited  studies  show 
the  technical  feasibility  and  potential  usefulness  of  this  kind  of  output. 

3.1.2  GRAPHS 

In  many  fields,  where  large  amounts  of  data  are  available,  graphs* 
showing  the  relationships  between  two  variables  are  more  useful  in  under¬ 
standing  these  relationships  than  are  tables  or  lists  of  numbers.  Graphs 
are  useful  in  several  ways:  when  the  equation  is  known,  the  graph  can  be 
used  to  explain  the  relationship;  when  the  equation  is  unknown,  the  graph 
can  indicate  what  the  equation  could  be. 


*  A  graph  is  simply  a  pictorial  representation  of  the  relationship  between 
variables  and  is,  in  a  sense,  a  substitute  for  an  equation  representing  this 
relationship.  * 
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Ord inarily , the  most  useful  graph  (becauoe  o £  its  relative  simplicity) 
is  one  whi^,.  represents  two  variables.  Three  variables  are  much  more  diffi¬ 
cult  to  handle,  but  can  be  graphed  by  means  of  a  "family  of  curves",  each 
curve  representing  a  two-variable  graph  for  the  total  situation  —  with  the 
third  variable  held  at  a  particular  (constant)  value.  A  more  informative 
representation  of  three  variables,  however,  is  a  contour  plot,  similar  to  a 
contour-type  map.  Study  of  more  than  three  variables  can  be  facilitated  by 
carefully  organized  arrays,  composed  of  either  two-  or  three-variable  graphs, 
arranged  side-by-side  or,  perhaps,  overlaid  one  on  another. 

Any  of  these  graphing  techniques  has  potential  (great)  value  in  the 
study  of  specific  disease-environmental  situations.  To  Illustrate  their 
use,  consider  a  hypothetical  (realistic)  situation  in  which  we  have  a  set  of 
ge.  C~aphic  localities  where  we  know  the  prevalence  of  human  leptospirosis, 
the  de:  ',ty  of  the  rat  population,  and  the  average  pH  of  the  surface  water 
(ponds,  streams,  etc.).  Suppose,  further,  that  the  interrelationship  among 
these  factors  is  such  that,  broadly  speaking,  there  is  more  human  lepto¬ 
spirosis  where  two  conditions  exist  together:  the  surface  water  is  slightly 
alkaline  and  there  are  many  rats.  The  majo  •  difficulty  in  producing  graphs 
to  show  this  relationship  comes  when  one  attempts  to  assign  unique  single 
values  to  the  graph  points.  Perhaps  this  can  be  resoled  by  averaging 
several  points  or  by  narrowly  restricting  the  geographical  area  from  which 
the  graph  points  are  taken. 

3.1.3  MAPS 

Speaking  generally,  a  map  is  a  representation  of  spatial  or  areal 
r  ’ationships  on  the  earth's  surface.  A  map  is  drawn  according  to  a  rigor¬ 
ous,  logical,  consistent  grid  pattern  and  scale,  so  that  there  is  no  non- 
systematic  distortion  of  size,  shape,  distance,  and  neighbors.  These  are 
characteristics  which  other  diagrammatic,  pictorial,  graphic  representations 
do  not  possess.  (For  examp  a  cartogram  allows  non-systematic.  distortions 
in  the  size,  shape,  and  neighbors  of  regions.)  A  map  has  two  independent 
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variables,  X  and  Y,  represented  by  longitude  ("L0’:)  and  latitude  ("L.a':), 
respectively  (or  latitude/longitude  equivalents),  and  portrays  the  varia¬ 
tions  of  a  third  dependent  variable,  Z,  i.e.,  value  ("VAL"),  as  height  above 
or  below  a  standard  plane.  Maps  and  graphs  are  distinguished  in  that  graphs 
can  use  X  and  Y  variables  that  do  not  represent  (geographic)  position. 
Furthermore,  a  map  is  drawn  orthogonally  to  a  datum  plane;  i.e.,  everywhere 
on  the  map,  the  viewer  looks  vertically  downward  toward  the  center  ci  the 
earth . 


Maps  are  especially  useful  when  a  substantially  large  volume  of 
(appropriate)  consistent,  related  data  is  available.  Maps  are  also  particu¬ 
larly  useful  when  data  have  a  location  characteristic  (already  represented 
by  two  variables,  longitude  and  latitude)  and  a  third  variable  which  con¬ 
sists  of  the  value  of  the  data  at  that  specific  location.  Under  these  con¬ 
ditions,  a  simple  two- variable  graph  is  no  longer  adequate  to  show  the  re¬ 
lationships.  Maps  have  proved  to  be  immensely  useful  tools  in  all  geo¬ 
graphically-oriented  fields  of  study.  It  is  in  recognition  of  these  impor¬ 
tant  advantages  that  we  have  concentrated  on  maps  in  our  considerations  of 
MOD  output. 

3.1.4  BLOCK  DIAGRAMS 

A  potentially  useful  map-like  representation  ci  geographically  dis- 
'  d  data  is  the  block  diagram.  This  differs  from  a  map  (as  defined 

car l ^graphically ,  Lobek,  1958)  only  in  that  it  is  constructed  obliquely 
(rather  than  perpendicularly)  to  a  datum  plane.  Because  block  diagrams  re¬ 
semble  maps  in  so  many  ways,  we  will  defer  further  discussion  of  •-hem  until 
we  have  considered  maps  in  detail. 

3.2  MAP  CONSIDERATIONS 


Maps  are  very  familiar  means  of  presenting  data,  especially  in  simple 
form,  however  they  can  be  very  complex.  In  order  to  establish  a  basis  for 
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understanding  the  potential  uses  and  effectiveness  of  maps  —  and  their 
limitations  —  the  following  discussion  considers  various  types  of  maps, 
what  kinds  of  data  they  portray,  how  they  portray  these  data,  and  how  the 
maps  are  constructed. 


3.2.1  CATEGORIES  OF  MAPS 


There  are  so  many  kinds  and  uses  of  maps  that  it  would  be  virtually 
impossible  to  consider  them  all.  The  following  list  of  categories  (in¬ 
cluding  some  items  which  are  not  strictly  maps)  reflects  a  classification 
which  we  developed  to  catalogue  those  maps  (conventional  and  computer  pro¬ 
duced)  that  might  prove  useful  to  the  >tOD  system: 

(1)  Map  index/list/catalog 

(?)  Outline  map 

(3)  Geographic  reference  map 

(4)  Air  navigation  chart 

(5)  Photo  (mosaic)  map 

(6)  Vertical  aeriai  photograph 

(7)  Topographic  map 

(8)  Hydrographic  (nautical)  chart 

(9)  Base  (plat /survey/cadastral)  map 

(10)  Oil  company  road  map 

(11)  Earth-science  map  (geologic,  soils,  glacial  and 
glaciers,  weather,  climate,  mines/minerals, 
palinapastic,  etc.) 

(12)  Other  physical/chemical-environmental  map 

(13)  Biogecgraphic  map 

(14)  Other  biological-environmental  map  (possibly 
agriculture,  forestry,  fishery,  etc.) 

(15)  Diseaue  map 

(16)  Economic  map  (agriculture,  forestry,  fishery, 
manufacturing,  processing,  engineering,  public 
works,  transportation,  communications,  trade, 
commerce,  finance,  etc.) 
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(17)  Historical  map  (political/social-historical, 
paleogeographic,  military/naval-'  istorical,  etc.) 

(18)  Other  human-environmental  map  (population, 
language,  religion,  etc.) 

(19)  Extra  terrestial  map  (star  chart,  etc.) 

(20)  Other  types  of  maps 

3.2.2  USEFULNESS  OF  MAPS 


Why  is  a  map  important?  Of  what  use  is  a  map  to  a  scientist  study¬ 
ing  geographically-distributed  factors?  Brief  consideration  nas  already 
been  given  to  these  questions  in  the  Preface,  but  primarily  in  relation  to 
the  MOD  system.  The  following  statements  are  a  more  general  response  to 
the  question-: 

(1)  Maps  summarize  a  great  deal  of  information  as 
compared  to  tables  or  narrative  prose. 

(2)  Maps  enable  the  scientist  to  overcome  his  physical 
limitations  (especially  size)  and  to  see  the 
broader  spatial  relationships  and  characteristics 
in  the  world  about  him. 

(3)  Maps  are  valuable  means  of  communication  because 
the  data  are  presented  vividly  and  in  an  easily 
understood  visual  form. 

(4)  The  use  of  various  graphic  methods  to  represent 
data  permits  the  pattern  of  environmental  variables 
to  be  readily  sten. 

(.5)  Maps  serve  as  a  powerful  means  of  generalization, 
aiding  in  the  analysis  of  spatially/areally 
distributed  data. 

■  An  appropriate  general  statement  (Bick  and  Johnson, 

1967,  p.  1)  is:  "Maps  are  indispensable  in  earth 
science  studies  because  knowledge  of  the  geographic 
(areal)  distribution  of  quantities,  temperature 
and  air  pressure,  for  example,  Is  vital  to  under¬ 
standing  the  processes  active  on  the  eartn.  The 
use  of  maps  involves  competence  on  the  part  of  both 
the  compiler  and  the  reader  with  respect  to  three 
fundamental  factors:  an  understanding  of  map 
scales,  how  to  determine  position,  aid  how  to  present 
the  data  in  a  form  that  can  be  readily  assimilated." 
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Underscoring  the  great  potential  of  maps,  the  statement  has  been  made 
that  cartography  (i.e.,  the  use  of  mans)  has  been  as  important  to  the  develop¬ 
ment  of  geographical  science  as  mathematics  has  been  to  the  development  of 
the  physical  and  engineering  sciences  (Bunge,  1962,  p.  33). 

Now  that  we  have  emphasized  the  advantages  of  maps,  let  us  consider 
the  disadvantages.  Maps  attempt  to  pre.  2nt  an  entire  picture  fron;  a  limited 
amouut  of  data.  All  maps  are  constructed  by  interpolation  techniques,  of  one 
sort  or  another,  from  a  finite  (comparatively  small)  number  of  observed  data 
points.  Map-making  techniques  are  compromises  between  mathematically  rigor¬ 
ous  portrayal  and  psychologically  realistic  portrayal.  Furthermore,  every 
map  is,  to  a  greater  or  lesser  extent,  schematic  and  employs  conventions  — 
and  every  map  must  be  interpreted  in  the  light  of  at  least  a  general  know¬ 
ledge  of  how  it  was  made  and  the  conventions  employed.  These  facts  do  not 
render  maps  invalid,  but  they  do  impose  a  responsibility  on  the  use::  to 
interpret  them  intelligently.  No  one  type  of  map  can  possess  all  possible 
virtues;  it  is  a  question  of  using  that  type  of  map  which  is  best  for  a 
particular  purpose,  combining  a  maximum  of  relative  advantages  with  a  mini¬ 
mum  of  relative  disadvantages,  i.e.,  limitations. 

In  addition  s.o  limitations  of  the  sort  we  have  just  described,  there 
may  be  errors  in  construction:  in  locating  the  X,  Y  (longitude,  latitude) 
of  the  data  points  or  in  determining  the  Z  (value)  of  the  data  points. 

These  can  represent  observational  errors  (which  depend  an  the  method  used  to 
measure  the  data  points’  values),  sampling  e:  ors  (when  only  a  limited  sample 
can  be  taken  at  unevenly  spaced  locations),  bias  error,  (in  which  a  person, 
subconsciously ,  prefers  certain  numbers  over  others),  conceptual  errors 
(related  to  the  validity  or  usefulness  of  the  concept  presented  on  the  maps), 
and,  in  contour  maps,  errors  due  to  faulty  approximation  of  the  surface  by 
(linear  or  other)  interpolation  between  known  values. 

3.2.3  CONVENTIONAL  MAI’S 

Our  purpose  Kere  is  to  point  out  several  important  characteristics  of 
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conventional  mapt. ,  charncf eristics  that  are  not  necessarily  related. 

A  fxiap  portrays  items  symbolically,  and  rsuch  of  cartographic  symbolism 
has  grown  up  over  the  years  to  the  point  that  it  is  now  a  well-established, 
standardized  set  of  conventions.  One  of  the  most  important  of  these  carto¬ 
graphic  conventions  concerns  the  bass  information  placed  on  a  particular 
map  to  show  that  minimum  amount  of  geographic  reference  information  (coast¬ 
lines,  political  boundaries,  cities,  rivers,  etc.) necessary  for  the  viewer 
of  the  map  to  correlate  the  distribution  of  the  facto,  being  mapped  with 
f  (liar  landmark  points  on  the  earth's  surface.  (Maps  v.;xch  are  prepared 
by  a  computer  are  usually  either  drawn  upon  or  laid  over  a  base  map  which 
already  contains  most  jf  these,  data.) 

The  kinds  of  data  which  can  be  mapped  are  unlimited,  so  long  as  the 
data  can  be  phrased  in  terms  of  X,  Y,  2  triplets  (longitude,  latitude,  and 
value).  The  major  problem  comes  in  selecting  data  points  so  that  the  re¬ 
sulting  map  surface  is  most  informative  and  of  "reasonable"  appearance. 

For  example,  no  fundamentally  different  techniques  of  mapping  are  involved 
when  making  a  contour  map  of  (Land  based)  diseases  in  a  coastal  region  than 
are  involved  when  making  such  a  map  for  an  interior  region.  Eut  in  the  case 
of  the  coastal  region,  the  cartographer  (human  or  computer)  influences  the 
map  he  is  making  by  modifying  his  set  of  data  (not  his  cartographic  methods) 
to  Include  a  large  number  of  data  points  out  in  the  ocean,  each  point  with 
a  value  equivalent  to  "no  disease  present". 

Logically,  a  map  represents  only  one  dependent  variable  (or  one 
disease-environmental  factor),  but  more  than  one  variable  can  be  represented 
either  by  a  series  of  maps  (each  displaying  a  distinct  or  unique  statistical 
surface)  or  by  overprinting  the  mapped  patterns  of  several  such  variables 
onto  the  same  base  sheet. 
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A  disease  map  is  one  that  shows  the  geographic  (areal  or  spatial) 
distribution  of  acme  clearly  defined  aspect  (qualitative  or  quantitative)  of 
the  total  disease  situation.  Disease  maps  are  abstract  statistical  surfaces, 
constructed  artificially  according  to  the  same  conventions  that  define  the 
construction  of  conventional  maps.  The  disease  maps  which  have  been  pro¬ 
duced  to  date  show  principally  tne  distribution  ot  occurrence  of  a  particu¬ 
lar  item  In  the  total  disease  situation,  often  with  extensive  narrative 
comments  on  the  maps  themselves  and/or  in  the  legends  (and  the  use  of  such 
verbage  indicates  a  failure  to  present  adequately,  in  map  form, the  medical 
information) . 

Disease  maps  are  closely  related  to  maps  of  other  abstract  statistical 
surfaces,  e.g.,  population  density,  rate  :>f  change  of  population  density, 
etc.,  which  cannot  actually  be  seen  or  observed  directly  in  the  field,  but 
which  must  be  calculated  from  field  observational  data  —  in  contrast  to 
road  mops,  topographic  maps,  type-of-bedrock  geologic  maps,  etc. 

The  use  of  disease-environmental  maos  as  a  research  tool  is  based  on 
the  assumption  that  coincidence  in  the  dlstribut  n  patterns  of  two  mapped 
factors  indicates  a  relationship  (causal,  associative,  or  coincidental)  be¬ 
tween  those  factors.  One  of  the  most  clear-cut  instances  in  which  m-1  ped 
patterns  have  been  used  to  indicate  important  disease-environmental  relation¬ 
ships  is  to  be  found  in  the  conclusion  (Burlitt,  1962,  p.  77-78)  that  the 
distribution  of  Burkitt's  tumor,  in  .Africa,  occurs  where,  simultaneously: 
the  altitude  is  less  than  5,000  feet;  the  seasonal  mean  temperature  is 
always  greater  than  60°F;  and  the  total  annual  rainfall  is  greater  than  20 
inches  (Fig.  3-3).  From  the  apparent  Interrelationship  among  these  factors, 
Burkltt  has  suggested  that  this  tumor  may  be  caused  by  an  insect-borne  viral 
agent.  Another  clear-cut  instance  in  which  a  disease  distribution  pattern 
(goiter)  matches  that  of  an  environmental  factor  (iodine  content  of  drink¬ 
ing  water)  Is  shown  in  Figure  3-4.  A  third  illustration  (Kratchman  and 
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Figure  3-3- A  Distribution  ('circles) 

showing  location  of  Burkitt's  tumor  ir> 
Africa. 

from  Postgraduate  Medical  Journal, 
Vol.  33,  p.  71,  re;  roduoed 
with  pevmissic;.  Vj  xtzs 
Editor  and  of  the  Author, 

D.  Burkin. 


Figure  3-  3-B  Area  (shaded)  where 
the  following  three  conditions  are  met 
simultaneously:  altitude  is  under 
5000  feet,  seasonal  mean  temperature 
always  exceeds  60°F,  and  total  annual 
rainfall  exceeds  20  inches. 
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Grahn,  1959)  involves  the  correlation  of  deaths  from  congenital  malforma¬ 
tions  with  higher-than-average  levels  of  environmental  radiation.  A  more 
detailed  consideration  of  the  use  of  disease  maps,  particularly  the  types 
that  could  be  produced  by  the  MOD  system,  is  given  in  Section  8,  Output 
Usage. 


3.2.5  "YMBOLIC  REPRESENTATION  ON  MAPS 


Data  can  be  represented  graphically  on  a  map  in  three  basically 
differ""1'  and  these  may  be  combined.  One  may  "se: 


•  Dot-type  symbols,  such  as  actual  dots  or  points, 
numbers,  letters  —  even  small  pictures.  These 
can  be  considered  as  zero-dimensional  symbols 
3ince,  in  practice,  they  approximate  a  geo¬ 
metric  point.  (The  term  dot-type  is  used  here 
in  its  literal,  descriptive  sense.  We  realize 
that,  to  a  cartographer,  a  dot-tvpe  map  is  a 
particular  kind  of  statistical  map  which  shows 
density  distributions  by  Jots.  Wc  are  using 
"dot-type"  as  synonymous  with  "point  value-"  or 
"data  point-"  or  "point-type".) 

•  Shading-type  symbols,  sued  as  various  Intensi¬ 
ties  of  grey,  or  various  colors,  or  patterns. 

These  can  be  considered  as  two-dimensional 
symbols  since,  in  practice,  they  approximate  a 
geometric  planar  area.  (Some  maps  using  shading- 
tvpe  symbols  are  also  known  as  choropleth  maps, 
others  as  dasymetric  maps.) 

•  Contour-type  symbols,  or  contour  lines  (also 
known  as  isarithms,  tsolines,  or  isopleths) . 

These  can  be  considered  as  three-dimensional 
symbols  sinev.  a  set  of  contour  lines,  in  practice, 
approximates  a  geometric  curved  surface. 

•  A  fourth  method  of  representing  information  on  a 
map  utilizes  flow- line  symbols  such  as  directional 
arrows  iFig.  3-5),  which  may  be  considered  as  one¬ 
dimensional  symbols,  approximating  geometric 
lines.  This  method  is  mentioned  only  in  passing 
as  it  has  but  limited  application;  it  must  be 
used  in  combination  with  one  of  the  other  thr.e 
types  of  symbols  to  be  meaningful. 
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Figure  3-5  F.1ow  line-type  maps  illustrating  the  routes  by  which  "Asian 
flu"  spread  over  the  world  during  February  1957  -  January  1958:  star  (in 
southern  China)  indicates  prob«v le  origin;  black  (and  white)  dots  show  the 
first  wave  of  cases. 

from  History  of  Geography  of  Diseases  by  Eensohen,  F.,  1962  (English 
translation  hy  Tate ,  1966),  A  Seymour  Laurence  Book  published 
by  Delaoorte  Press ;  used  with  permission. 
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It  is  appropriate  to  reconsider  the  meaning  of  data  in  the  light  cf 
the  three  kinds  of  symbolic  representations  we  have  just  presented.  When 
numerical  data  are  presented  on  maps,  cartographers  commonly  speak  of  these 
as  statistical  maps.  Many  of  the  data  which  relate  to  disease-environmental 
situations  are  numerical  (quantitative),  many  are  qualitative.  The  follow¬ 
ing  division  of  symbols,  on  the  basis  of  qualitative  versus  quantitative, 
may  be  helpful. 


SYMBOL 

QUALITATIVE 

i 

QUANTITATIVE 

t 

Point  and  Line 

(dot- type) 

roads 

towns 

dot  distribution  density 
f of  population 

flow  lines 

Area 

(shading- type) 

vegetation  type 

land-use  type 

land-form  symbols 

chorapleth  and  other 
shadjtngs  where  numerical 
values  have  been  assigned 

circles 

shadings 

Contour 

nominal  or  ordi¬ 
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3.2.5. 1  Dot-type  maps  Dot-type  (data  point)  maps  are  useful  tools  for 
the  study  of  various  environmental  factors,  especially  their  qualitative 
aspects,  i.e.,  whether  a  particular  factor  is  present  (yes)  or  is  not  pre¬ 
sent  (no)  at  a  particular  place.  (Quantitative  aspects  can ; be  shown,  however, 
when  different  kinds  of  dots  ((size,  color,  etc.))  are  used (to  indicate 
differences  in  amount.)  Probably  the  best  known  dot-type  maps  are  those 
showing  biogeographic  distribution  of  various  species,  including  human  popu¬ 
lation  distribution  maps. 


To  construct  a  dot-type  map  the  cartographer,  specifies  exactly  what 
disease  or  environmental  factor  he  x.  tends  to  map,'  and  just  how  he  will  draw 


3-21 


'4AFFING  OF  DISEASE 

the  map.  Next,  he  obtains  an  internally  consistent,  relevant  set  of  data 
points,  each  of  which  ire  expressable  as  longitude,  latitude,  value  (LO,  lA, 
VAL;  triplets.  Then  he  select;  a  sheet  of  paper  appropriately  gridded  for 
longitude  (LO)  and  latitude  (LA),  places  a  uot  on  the  grid  according  to  the 
(LO,  LA)  of  the  point,  and  writes  the  point's  value  (VAL)  next  to  that  dot. 
Finally,  he  divides  the  total  range  of  VAL's  represented  on  the  entire  map 
into  several  groups  or  intervals,  selects  an  appropriate  dot-type  symbol  for 
each  intervalj  and  draws  over  each  dot  the  appropriate  symbol  for  its  VAL. 

In  this  consideration  of  act-type  (point-value)  maps  we  have  used  the 
sort  of  technique  which  would  be  ne.  ssary  for  any  computer  programmed 
system.  In  practice  the  (human)  cartographer  might  not  locate  dots  by 
latitude  an  1  longitude,  particularly  if  the  density  of  data  points  was  such 
that  this  degree  of  precision  was  meaningless.  The  densitv  of  dots  might 
be  determined  by  the  data  being  plotted,  e.g.,  one  dot  *  one  h  ndred  people, 
in  which  case  the  dots  would  be  placed  to  represent  the  pattern  in  reality. 

If  there  were  only  100  people  in  a  county,  the  single  dot  would  be  placed 
where  most  people  lived.  The  center  of  the  (geographic)  unit  would  be 
chosen  only  if  the  distribution  were  uniform. 

Figures  3-6  through  3-10  illustrate  various  dot-type  maps  —  some  from 
published  works,  other  produced  by  the  MOD  group.  Figures  3-6  and  3-7,  from 
published  papers,  are  lot-type  maps  that  portray  the  distribution  of  several 
environmental  factors.  Figure  3-8,  In  its  published  form  used  dot-type  sym¬ 
bols  of  different  colors  as  well  as  shapes  to  show  the  distribution  of  lepto- 
spiral  serotypes.  Figure  3-9  presents  our  standard  set  of  {schistosomiasis) 
data  as  two  manually  drawn,  dot-type  maps;  Figure  3-10  as  nmputer/line- 
printer  output  (using  the  Kansas  Geological  Survey  trend-surface  program). 

3. 2 .3. 2  Shading-type  Maps  The "statistical  surlace"representeo  by  a  shad¬ 
ing-type  map  consists  of  a  series  of  essentially  horizontal  planes  that  have 
different  elevations  x. id  that  are  separated  by  vertical  cliffs  -  escarpments 
(i.e.,  a  step  function).  The  different  elevations  are  represented  by  vari¬ 
ations  in  shades  of  grey,  or  patterns,  or  color. 
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Figure  3-6  Published,  manually  drawn,  dot-type  map  showing  the  amount  of 
land  in  farms  (related  to  the  size  of  the  circles),  also  the  percentage  of 
that  land  available  for  crops  (related  to  the  area  of  the  black  sectors 
within  the  circles) . 

from  Elements  of  Cartography ,  2nd  eel. ,  by  Robinson ,  A .  H. ,  WW, 
published  by  John  Wiley  and  Sons,  Inc.,  New  York  and 
reproduced  with  permission. 
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crude  oil  gravity  values,  plotted  on  a  line-printer  (Harbaiigh,  1964,  p.  57; 
courtesy  of  State  Geological  Survey  of  Kansas);  B  (lower),  showing  aircraft 
positions  during  an  interception,  plotted  on  a  Benson-Lehner  plotter 
(courtesy  of  Bell  Telephone  Laboratories).  j 
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Figure  3-8  Section  of  published,  manually  drawn,  dot-type  map  portraying 
the  occurrence  of  various  leptospiral  serotypes  in  different  countries;  the 
presence  of  each  serotype  is  symbolized  by  a  dot  of  a  different  shape  and 
color  on  the  original  map. 
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Figure  i-10  Dot-tvpe  map  produced  during  MOD  study  from  the  standard 
set  of  South  American  schistosomiasis  data  (Figure  3-1),  utilizing  Kansas 
Geological  Survey  trend-surface  pror-am  running  on  an  IBM  7090  computer 
with  output  on  a  line  printer  (outline  of  the  continent  was  added  manually) 
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Shading-type  maps  have  also  proved  to  be  quite  usetui  tools  in  studying 
various  environmental  factors,  especially  in  relating  a  qualitative  factor 
to  an  area  location.  (As  with  dot-type  maps,  quantitative  aspects  can  also 
be  shown.)  Well-known  examples  of  shading-type  maps  ate  the  large-  to 
small-scale  bedrock  geology  maps  published  for  most  states  and  countries  by 
governmental  geological  surveys. 

To  construct  a  shading-type  map  the  cartographer  specifies  exactly 
what  disease  or  environmental  factor  he  intauds  *o  map,  and  just  how  he  will 
draw  the  map.  Next,  he  obtains  an  internally  consistent,  relevant  set  of 
data  points,  each  of  which  must  be  expressable  in  the  form  of  (LO,  LA,  VAL) 
triplets.  Then  he  selects  a  sheet  of  napev  nppropiiaLwl/  grladed  for  LO  and 
LA,  draws  the  outlines  of  the  units  (political  or  otherw.se)  on  the  (LO,  LA) 
grid,  and  enters  the  VAL  for  the  area  (determined  by  grouping  procedures). 
Finally,  he  divides  the  total  range  of  VAL's  represented  on  the  entire  map 
into  several  intervals  or  groups,  selects  an  appropriate  shading-tvpe  symbol 
for  each  interval,  and  shades  or  colors  each  unit  with  the  appropriate 
symbol  for  its  VAL. 

Figures  3-11  through  3-15  illustrate  various  shading-type  maps.  Some 
of  these  are  from  published  works,  others  were  produced  by  the  MOD  group. 
Figure  3— 13,  taken  from  the  published  medical  literature,  shows  how  shading 
techniques  can  be  used  to  n  resent  various  medical  data.  Figure  3-14  shows 
the  standard  set  of  (schistosomiasis)  data  presented  in  the  form  of  a 
shading-type  map.  Figure  3-15  portrays  that  same  data  as  output  by  a 
computer/line-printer  configuration,  using  a  simple  program  which  we  prepared. 

3,2.5. 3  Contour-type  Haps  Contour-type  maps  have  proved  to  be  very  use¬ 
ful  tools  for  the  study  of  various  environmental  factors,  particularly  those 
considered  by  the  several  earth  sciences.  They  are  also  very  well  suited 
for  the  study  of  many  disease  situations.  This  is  because  contour  maps 
present  quantitative  as  well  as  qualitative  aspects. 
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Figure  3- 12  Examples  of  machine- 

drawn,  shading-type  maps:  A,  done  by 

computer /line-printer  configuration 

(Fisher  et  al.  1967,  courcesy  of 

Laboratory  for  Computer  Graphics , 

Harvard  University);  B ,  done  by 

computer/plotter  configuration 

from  Computer  Representation  of  Planar 
Regions  bp  their  Skeleton ,  by  Pfaltz 
and  Roserif eld:  Cormuniagtions  of  the 
ACM,  Vol.  10,  No.  2  (Feb.),  1967;  re¬ 
produced  with  permission. 
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Figure  3- 13  Published,  manually  drawn,  shading-type  map  displaying 
Statistics  that  represent  infant  mortality  in  part  of  Great  Britain. 


from  National  Atlas  of  Disease  Mo .  tality  in  the  United  Kingdom,  1963 
by  Houe,  G.M.,  reproduaea  vi'ch  permission  of  the  publisher , 
Thomas  Nelson  arid  Sons  Ltd.  ,  Middlesex • 
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Figure  3-15  Shading-type  map  of  standard  set  of  schistosomiasis  data 
(Figure  3-1)  output  on  a  line  printer  after  processing  by  an  IBM  7090  (out 
line  of  South  America  added  manually).  I 
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Dot-type  acd  shading- type  asps  sre  ( intuitive iy)  understood  by  most 
biomedical  and  data -processing  crientsd  people  with  whoa  the  study  team  has 
conversed,  but  this  was  not  the  case  with  contour-type  maps.  For  this  rea¬ 
son,  and  because  contour  saps  are  so  important  in  the  MOD  project,  we  pre¬ 
sent  here  a  somewhat  detailed  explanation  of  what  contour- type  maps  mean 
and  toy  they  are  constructed. 

Well-known  examples  of  contour-type  mapping  include  the  large-scale 
topographic  maps  so  widely  used  in  the  earth  sciences,  and  weather  maps. 
Basically,  a  contour-type  map  is  a  device  by  means  of  which  a  three-dimen¬ 
sional,  complex  geometrical  figure  (real  or  imagined)  can  he  represented  on 
a  two-dimensional ,  simple  plane  surface.  The  map  accomplishes  this  by  a 
'set  ot  lines' ,  i.e. ,  contours  (sometimes  called  isolines,  isarithms  or 
isopicths)  which  outline,  according  to  well-defined  rules,  the  shape  of 
that  complex  geometrical  figure.  Each  contour  is  a  line  drawn  or.  the  map 
or  chart  connecting  points  of  equal  value.  This  line  often  represents 
points  of  equal  elevation  above  (or  below)  some  assumed  base  elevation,  but 
it  may  also  represent  points  of  equal  temperature  or  humidity  or  population 
density  or  disease  prevalence. 


Because  the  geometrical  figure  being  represented  by  the  map  is  three- 

dimensional,  that  figure  can  be  treated  as  a  set  consisting  of  many  ordered 

triplets  of  numbers  (X.  ,  Y  ,  Z, ) .  For  each  specific  pair  (X  ,  Y  ),  one,  .and 

1  X  J.  s  s 

only  one.  Z  value  (Z  )  exists;  i.e,  »  2  F(X,  i) . 

Contour  techniques  can  be  used  to  represent  the  form  of  any  geometri¬ 
cal  surface  or  the  farm  (X,,  Y  ,  Z.  =  F(X. ,  Y ,)).  The  variables  X  and  Y, 
theoretically,  can  represent  values  for  any  conceivable  independent  disease- 
environmental  factors.  Under  these  conditions  the  result  is  a  graph  shov¬ 
ing  the  relationship  among  three  disease-environmental  factors.  In  order 
to  make  «t  contour  map  of  a  particular  disease-environmental  factor,  X  is 
taker,  to  be  the  LO  (longitude)  of  a  geographic  point,  Y  the  LA  (latitude), 
and  Z  the  VAL  (value)  of  the  specific  factor  at  that  particular  (X,Y) 
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point  locality.  Thus,  in  making  a  map,  the  X,  Y,  Z  triplet  becomes  a 
special  case;  it  becomes  a  LO,  LA,  VAL  triplet- 

Now  that  we  have  considered  these  general  principles ,  let  us  examine 
a  standard  topographic  contour  map.  On  such  a  map,  LO  (-X)  and  LA  (®Y)  are 
obvious  and  need  no  elaboration.  VAL  (“Z)  is  taken  to  be  the  elevation  in 
feet  above  or  below  the  datum  picine  of  mean  sea  level,-  The  relationship 
between  an  actual  land  surface  and  its  representation  as  a  contour-type 
map  is  illustrated  in  Figure  3-16.  The  remainder  of  this  discussion  will 
be  based  upon  actual  construction  of  a  contour-type  topographic  map.  Al¬ 
though  this  illustration  uses  VAL'a  of  elevation,  any  disease-environmental 
factor  which  could  be  assigned  a  unique  value  (VAL  )  at  each  specific  geo- 

H 

graphic  locality  could  be  contour  mapped  in  precisely  the  same  way,  e.g,,  to 
show  the  infection  rate  of  Schistosoma  oanson.1  in  man,  or  the  mean  total 
annual  rainfall. 

To  describe  the  product  n  of  a  contour-type  map,  we  will  use  a  spe¬ 
cific  example.  We  will  construct  a  contour  map  (with  10-foot  contour 
intervals)  showing  land  elevation  (in  feet)  above  mean  sea  level  over  a 
small  area  containing  a  twin  -  peaked  hill  (Fig.  3-17A).  We  will  use  the 
data  contained  in  Fig.  3-17B,  which  were  actually  taken  from  Fig.  3-17A, 
and  a  square  (LO,  LA)  grid.  [  ►  in  the  margin  mark  discrete  steps.] 

C*  First  the  cartographer  specifies  exactly  what  disease-environmental 

factor  he  intends  to  map  and  just  how  he  will  draw  the  map  (selects  ccn- 
jp  tour  interval,  etc.).  Next,  he  obtains  an  internally  consistent,  relevant 
set  of  date  points  expressed  in  the  form  of  LO,  LA,  VAL  triplets,  and 
selects  a  sheet  Of  paper,  appropriately  gridded  for  LO  and  LA. 

S>  Then,  just  as  with  dot-type  or  snading-iype  maps,  he  plots  each  data 

point  on  the  (LO,  LA)  grid  and  writes  the  point's  VAL  next  to  that  dot.  By 
this  operation,  the  data  of  Figure  3-17B  become  the  dot-type  map  shown  in 
Figure  3-18.  But  from  tnis  step  on,  the  procedure  differs  from  those  de¬ 
scribed  before. 
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►  After  all  the  data  points  have  been  inserted,  the  cartographer  scans 
(visually)  to  note  the  highest-  and  lowest-valued  points,  also  any  broad 
trends  that  the  data  point  values  may  suggest.  In  our  example  (see  Fig.  3-18), 
the  highest-valued  data  point  is  the  73  fcot  point  near  41°25',  17'>22’,  and 
the  lowest-valued  data  point  is  the  5  foot  point  near  41°45  ’,  1?°29\  The 
data  points  indicate  a  high  area  in  the  left-central  and  central-central 
portions  of  the  map  and  imply  that  the  surface  slopes  downward  sharply  north, 
northeast,  and  east  of  this  high  area,  and  more  gently  south  from  the  high 
area.  Another  important  observation  is  that  the  data  points  appear  to  he 
most  densely  distributed  in  the  upper-left  quadrant  of  the  map. 

►  After  this  preliminary  evaluation,  the  cartographer  decides  which  con¬ 
tour  line  will  probably  be  easiest  to  draw  and,  at  the  same  time,  will  sug¬ 
gest  the  overall  shape  of  the  surface  being  contoured.  The  highest-valued 
data  point  is  commonly  chosen  and  successively  lower-valued  contours  drawn 
around  it.  (Occasionally  the  lowest-valued  data  point  is  a  better  choice.) 
Sometim  s  the  most  nearly  middle-valued  contour  line  is  selected,  tracing 

►  it  through  the  field  of  data  points.  The  cartographer  then  picks  out  the 
data  point  most  appropriate  in  view  of  this  choice  (above),  and  draws 
straight  lines  from  that  data  point,  to  the  several  (perhaps  five  to  ten) 

►  surrounding  data  points  which  are  nearest.  Taking  the  VAL’s  of  the  two  data 
points  at  the  end  of  each  line,  he  interpolates  (and  marks  the  position  of) 
the  contour-line  values  along  each  line.  In  Figure  3-19,  this  process  has 
been  performed  for  two  data  points,  the  left-hand  one  representing  an  at¬ 
tempt  to  locate  the  highest-valued  contours  (here,  the  70-,  60-,  and  50- 
foot  contours),  and  the  right-hand  one  representing  an  attempt  to  locate 
part  of  the  middle-valued  contour  (here,  the  30-foot  contour). 

Parenthetically,  note  that  only  the  originally  plotted  data  points  can 
be  considered  to  be  "known"  points;  between  these  known  data  points  are  an 
infinite  number  of  points,  each  with  determinable  LO  and  LA,  but  with  mi- 
known  VAL.  If  a  contour  map  of  the  surface  under  consideration  is  to  be 
made,  this  surface  must  be  assumed  to  be  a  reasonably  smooth,  regular 
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Figure  2-17 -A  Land  surface,  and  -B,  table  of  data  points,  used  to 

demonstrate  procedures  of  constructing  contour-type  maps  (Figures  3-18 
through  3-23) . 
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geometrical  surface.  Only  under  these  conditions  is  it  meaningful  to  assign 
interpolated  (inferred)  numerical  VAL's  to  the  many  other  loci  distributed 
among  the  few  known  data  points,  and  every  unknown  point  must  be  assigned 
a  mathematically-manipulatable,  numerical  value  (rather  than  a  "?"  value). 

Then  the  cartographer  continues  in  this  manner,  selecting  other  ap¬ 
propriate  (known)  data  points  and  interpolating  contour-line  values  around 
: '  em,  building  up  a  large  number  of  rather  closely  spaced  interpolated  data 
points  to  "upplement  the  known  ones.  This  phase  is  illustrated  by  Figure 
3-20  in  which  interpolated  data  points  with  VAL's  of  70,  60,  and  30  feet 
are  indicated.  (Interpolated  data  points  with  ether  VAL's  have  been  left 
off  the  map  for  purposes  of  clarity.) 

At  this  stage  the  cartographer  draws  a  line  connecting  successively 
adjacent,  known  and/or  interpolated  data  points  with  identical  VAL's,  and 
draws  lines  in  this  fashion  for  each  VAL  to  be  contoured.  These  lines  are 
irregular  at  this  stage  and  can  be  considered  as  preliminary  contour  .lines. 
The  7U-,  60- ,  and  30-foot  preliminary  contour  lines  of  our  example  are  shown 
in  Figure  3-21;  as  before,  the  other  contour  lines  have  been  omitted  for 
ease  of  visualization. 

The  final  step  comes  when  the  cartographer  (visually)  inspects  the 
preliminary  contour  lines.,  then  smoothes  them, exactly  as  one  draws  smooth 
curve  through  a  set  of  data  points  presented  on  a  graph.  Assumed  (slight) 
inaccuracies  in  interpolated  data  point*'  values,  coupled  with  the  presumed 
regularity  of  the  surface  being  contoured,  permit  this  smoothing  to  be  done 
with  no  real  loss  of  accuracy.  The  smoothly  curved  lines  resulting  from 
this  last  operation  can  be  considered  to  be  ■,  tna i  contour  lines.  The  70- , 
f0- ,  and  30- loot  final  contour  lines  of  our  illustrative  example  are  shown 
in  Figure  3-22.  Figure  1-  2 J  shows  aii  the  final  contour  lines  i.e.  ,  the 
completed  contour-type  map.,  for  comparison,  the  original  land  surface  from 
which  the  data  (Fig.  3-1715)  were  taken  is  shown  in  Figure  3-1 JA. 
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in  drawing  cbe  preliminary  and  final  contour  lines,  the  cartographer 
proceeds  in  accordance  with  weil-defined  rules  (as  outlined  by  Ohman,  1963). 
These  are  illustrated  in  Figure  3-24. 

(1)  Kost  commonly,  the  contour  line  will  be  drawn  sc  as 
to  connect  successively  the  closest  geographic  point 
localities  that  possess  the  sane  known  or  inter¬ 
polated  VAL  (Fig.  3-24A). 

(2)  When  points  with  different  VAL's  intervene  between 
the  two  closest  points  being  used  to  construct  a 
particular  contour,  the  contour  line  jlust  fit 
around  the  first-mentioned  points  so  as  to  connect 
points  other  than  the  closest  points  with  that 
particular  VAL  (Fig,  3-24B).  Contour  lines  should 
be  'smooth"  to  convey  a  meaningful  concept  of  the 
smooth  surface  which  they  are  attempting  to  portray. 

(3)  Contours  must  be  capable  of  closing  either  cn  or 
off  the  map  (Fig.  3-24C) .  Contour  lines  depicted 
on  two  separate  but  adjacent,  contiguous  maps  must 
connect  when  the  edges  oi'  the  maps  are  put  gogether. 
(Obviously,  the  form  of  the  land  surface  uoes  not 
vary  according  to  how  the  boundaries  of  the  topo¬ 
graphic  quadrangle  maps  showing  the  land  are 
arranged.) 

(4)  An  elongated  ridge  crest,  representing  a  reversal 
of  slope  or  gradient,  should  be  shown  by  two 
opposing  contour  lines  of  the  same  VAL,  rather  than 
by  a  single  contour  line  (Fig.  3-24D).  This  is  be¬ 
cause,  in  nature,  no  ridge  crests  exist  which  are  of 
precisely  the  scae  elevation  (VAL)  as  a  particular 
contour  for  a  distance  great  enough  to  be  shown 

on  «  amp, 

(5)  When  the  points  being  used  to  draw  contour  lines 
are  located  on  the  corners  or  a  square,  it  may  be 
that  two  possible  sets  ef  contours  can  be  drawn  for 
them  (Fig.  3-24E),  Such  a  situation  can  be  resoivea 
by  interpolating  a  fifth  point  at  the  center  of  the 
square  with  a  value  (VAL?  that  is  the  average  of  the 
two  possible  interpolated  values  between  the  pairs 

of  points  on  opposite  corners  of  the  souare  (Fig.  3-24F) . 

continued  next  page 
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(6)  A  contour  line  must  never  be  a  dangling  spiral 
(Fig.  3-24G). 

(7)  Contour  lines  used  to  map  a  particular  factor 
must  never  cross.  Obviously,  referring  to  our 
topographic  map,  a  particular  geographic  point 
cannot  be  both  10  feet  and  20  feet  above  sea  level. 

Contour  lines  greatly  aid  comprehension  in  that  they  tie  together 
points  of  comparable  value  and,  in  addition,  show  rates  of  change,  thereby 
stressing  relationships  which  might  otherwise  be  less  apparent.  Such  in¬ 
formation  is  completely  lost  with  shading-type  maps  since  all  values  in  an 
area  are  combined  to  produce  an  average  —  a  value  which  may  not  even  exist, 
as  such,  in  the  area. 

Thus,  shading-type  maps  (and  to  even  greater  extent  dot-type  maps)  present 
quite  limited  information  about  disease-environmental  relationships  because 
of  problems  involved  in  grouping  data,  problems  which  often  lead  to  marked 
distortion  cf  place/quantity  ’-elatxonships .  (An  exception  is  the  special 
case  where,  using  high  resolution  (large  scale)  maps,  every  case  (or  small 
,,roup  ?  of  cases,  of  a  particular  disease  is  precisely  lccat<H  by  a  dot. 

Trds  technique  ia  widely  used,  and  very  successfully  by  epidemiologists.) 
This  sew-ms  an  adequate  explanation  of  why  uisease-envlronmental  maps  have 
not  beer:  more  widely  i  d.  "he  cethod  of  contour  nr  oping  gets  iround  the 
most  serious  of  th  data  grouping  problems  (mentioned  abo ;  j  ,  hut  makes 
much  greater  demands  of  the  data  in  terms  of  "completeness". 

Various  contour-type  map*  are  shown  in  Figures  3-?5  through  3-29,  in- 
'  i  oae  that  deals  with  biomedical  problems  (Fig.  3-27).  (There  have 
been  relatively  few  contour  maps  produced  that  deal  with  disea* e-envivon- 
raentaJ  data.  The  problems  which  we  have  encountered  in  developing  the  MGD 
system,  particularly  those  dealiag  with  data  structuring,  would  explain 
this.)  MOD  system  produced  .ontour  maps,  based  upon  the  standard  set  of 
(schistosomiasis)  data  are  also  shown.  Figure  3-28  is  one  that  was  drawn 
manual  Figure  3-29  was  produced  by  a  computer.  Further  examples  will  be 
presented  later  in  the  discussion  of  computerized  mapping  programs. 
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Figure  3-26  Examples  of  machine- 
drawn,  contour-type  maps:  A  (upper), 
showing  1960  human  population  of  Ann 
Arbor,  Michigan,  output  on  a  line 
printer  (Toiler,  1966,  p,7;  courtesy 
of  W.R.Tobler, ) and  B  (lower)  showing 
hydrographic  data  (Tablet,  1964  p,  4; 
courtesy  of  W.R.Tcbier) . 
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areal  variations  of  standardized  mortality  ratios  calculated  for  Australian 
human  males  who  died  from  arteriosclerotic  and  degenerative  heart  disease 
during  1959-1963  —  Learmonth  &  Nichols,  1965. 

from  Mai'S  of  some  standardized  mortality  ratios  for  Australia ,  1939-1963 : 
Occasional  Paper  No.  3,  by  Learmonth, A. T. A.  and  Nichols,  G.C., 

1963,  The  Australian  Natl.  Univ.,  Canberra;  reproduced  with 
permission. 
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3.2. 5.4  Combination-type  Maps  The  basic  types  of  maps  we  have  described 
can  be  combined  to  increase  understanding  of  the  data.  Figures  3-30  through 
3-34  show  schematically  various  comoinations  of  dot-,  shading-,  and  contour- 
type  symbols.  Figures  3-30  and  3-31  illustrate  combination-type  maps  of 
particular  environmental  factors.  The  standard  set  of  (schistosomiasis) 
data  mapped  by  the  MOD  team,  using  various  methods,  is  shown  in  Figure  3-33 
as  a  manually-produced  map  on  which  a  combination  of  dot-,  shading-,  and 
contour-type  symbols  is  used.  These  data  appear  again  in  Figure  2-34  as  a 
computer-produced  map  utilizing  dot-  and  contour-type  techniques. 
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One  of  the  most  important  tasks  in  preparing  data  for  mapping  is  to 
determine  the  best  representation  of  the  location  (on  a  LO,  LA  coordinate 
basis)  for  each  bit  of  datum  to  be  mapped.  This  is  because  disease  factors 
cannot  be  measured  in  the  same  way  as  points  for  a  topographic  map.  In 
topography,  a  point  location  has  a  single  exact  value  which  Is  directly 
measurable,  and  related  only  to  that  one  location  (e.g.,  2,398  feet  above 
sea  level).  Disease  infect* or  rate,  on  rhe  other  hand,  must  be  >xilculai<:J 
for  an  area  by  summing  the  total  number  or  animals  (including  human  beings) 
in  the  area  and  dividing  this  by  the  cumber  of  infected  animals  (of  the 
same  species  or  group''  in  the  area.  The  value  thus  derived  is  abs*.  lure  only 
with  reference  to  the  area  boundaries,  and  would  change  if  area  boundaries 
changed  (even  though  the  satae  point  location  might  be  used  for  the  data 


location  in  each  case).  Difficulties  often  occur  when  data  from  different 
size  areas  are  combined  on  one  sap.  The  larger  the  area  considered,  the 
less  variation  would  show  on  t.ue  map.  To  carry  this  to  an  extreme,  the 
area  covered  by  the  whole  map  could  he  represented  by  one  point .  This 
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from  Geode' 8  World  Atlas,  12th  ed.3  1964 

Copyright  by  Hand  McNally  &  Co. ,  R.L.  68  S  86; 
used  with  permission. 

Figure  3~30  Published  map  combining  contour-type  (the  lines  separating 
differently  colored  rJgions)  and  shading-type  (the  colors  between  the 
contours  reproduced  here  as  different  shades  of  gray)  symbols  showing 
population  data. 
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Figure  S-Sl  Machine-computer-drawn  combination- type  maps:  A  (upper), 
utilizing  combination  of  shading-type  (the  alternating  bands  of  characters) 
and  contour-type  (the  boundaries  between  adjacent  white  and  black  bands) 
symbols  to  portray  crude  oil  gravity  data,  output  on  line-printer 
(Harbaugh,  1964,  p.  56;  courtesy  of  State  Geological  Survey  of  Kansas); 
and  B  (lower),  utilizing  contour-type  and  dot-type  symbols  to  portray 
hy-  cographic  data,  output  on  a  CalComp  plotter  (courtesy  of  California 
Computer  Products,  Inc.). 
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Figure  3-52  Published,  manually  drawn  map  using  combination  of  dot-, 

shading-,  and  contour-type  symbols  to  portray  urban  infant  mortality 

(number  of  deaths  per  IGCO  live  births)  in  India  in  1°^8  —  Learmonth,  1965. 

from  Health  in  the  Indium  Subcontinent.  1955-1964:  Occasional  Taper 
No.  2 i,  by  Learmonth ,  A.T.A. ,  1965 ,  Tne  Australian  NatUUniv., 
Canberra;  reproduced  with  permission. 

3-^8 


mapping  techniques. 


MAPPING  OF  DISEASE 


Figure  S-34  The  standard  set  of  South  American  schistosomiasis  data 
(Fig.  3-1),  presented  as  a  combined  dot-type  (the  X’s)  and  contour-type 
map,  drawn  by  a  CDC  3600  computer  utilizing  an  offline  Ink-on-paper 
CalComp  plotter  and  the  Control  Data  Corporation's  gridding/contouring 
program  with  a  fine  <3’-id  (outline  of  continent  added  manually). 
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should  not  be  used  in  combination  with  country  grouping.  When  data  are 
grouped  in  an  artificial  manner  such  as  this*  the  user  must  be.  aware  of 
their  limitations.  Disease  is  no  reanector  of  political  Boundaries,  and 
disease  data  that  are  grouped  (and  mapped)  by  country  or  province  (often  a 
technical  necessity)  may  be  quite  misleading.  For  example,  knowing  that  a 
county  has  a  25%  overall  infection  rate  of  a  specific  disease  would  not 
indicate  that  a  particular  village  within  that  county  had  an  80%  infection 
rate  whereas  the  surrounding  rural  area~  had  an  infection  rate  of  less  than 
1%.  The  illustrations  on  the  following  two  (insert)  pages,  3  -  61A  and 
3  -  61B, emphasize  some  of  these  limitations. 

Despite  the  limitations  we  have  mentioned,  data  must  often  be  grouped 
and  related  to  areas  larger  than  desired,  and  we  must  do  the  best  we  can 
with  them.  One  of  the  many  possible  methods  of  handling  such  a  problem 
would  be  to  locate  the  grouped  data  at  a  "center  of  gravity"  based  upon 
population  distribution.  Ideally  the  MOD  system  would  use  regularly  shaped 
areas,  preferably  square,  forming  a  grid,  as  a  basis  for  map  production  — 
and  the  grid  squares  would  be  adjusted  (in  size)  as  necessary  to  reflect  a 
reasonably  uniform  disease-environmental  situation  All  data  within  a 
given  square  c^uld  be  combined  at  the  center  by  calculating  an  inverse 
distance  function  for  each  datum  value.  Squares  with  no  data  could,  if 
desired,  be  exempted  from  further  consideration  in  mappirg.  Unfortunately, 
this  concept  is  not  universally  accepted  because,  in  non-conputerized  map¬ 
ping,  gridding  data  points  is  considered  to  be  unnecessary  (as  we  discussed 
in  3. 2. 5.1  in  relation  to  dot-type  maps). 

Map  scale  and  map  projection  warrant  consideration  before  we  describe 
the  actual  construction  of  disease  maps.  Map  scale  i.'  expressed  in  three 
ways:  verbally  (e.g.,  one  inch  equals  40  miles),  graphically,  by  a  scale 
which  is  drawn  on  the  map  —  a  line  divided  into  units  which  represent  actual 
distances  on  the  map,  and  fractionally .  where  the  scale  is  indicated  by  a 
ratio,  e.g.,  1/25,000.  Map  scale  is  determined  by  the  area  to  be  mapped 
and  the  desired  dimensions  of  the  map.  These  determinations  are,  in  turn. 
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An  illustration  of  the  misleading  information  that  can  come  when 
disease  data  is  expressed  in  relation  to  political  (unit)  areas, 
without  consideration  of  actual  (mathematical)  areas. 
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An  illustration  of  how  disease  prevalence  or  incidence  figures 
can  be  very  misleading  when  related  to  political  (unit)  areas, 
even  though  they  may  be  of  equal  size. 
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An  illustration  of  how  one  might  form  a  false  impression  from 
disease  data  reported  by  political  areas.  In  this  example 
the  disease  is  altitude  dependent,  and  its  distribution  re¬ 
lated  to  a  mountain,  not  to  the  artificial  boundaries  of 
Provinces  A,  or  B,  or  C,  or  D.  Although  we  have  not  shown  it 
here,  this  altitude  relationship  would  be  immediately  apparent 
upon  overlaying  the  (transparent)  disease  map  on  a  base  topo¬ 
graphic  me,  . 


Obviously,  disease  Drevalence  or  incidence  data,  when  presented  in 
dot-type  or  contour  map  form  —  as  shown  in  figures  A,  B,  and  C  —  would 
not  ue  subject  to  the  type  of  misinterpretation  that  could  arise  if  the 
data  were  reported  simply  as  f  gures,  related  to  political  areas. 
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Copyright  by  the  University  of  Chicago; 
reproduced  with  permi-ssion. 
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Figure  3-35  The  Western  Hemisphere  mapped  on  map  projections  found  to 

be  most  useful  to  the  MOD  system:  A,  equirectangular  projection,  B, 

Goode’s  homolosine  r-rojection,  C,  Miller  cylindrical  projection,  D, 

Mercator  orojection,  E,  cylindrical  equal-area  projection.  Note  that  in 

E,  dist^-tion  is  minimal  j.n  the  tropical  zone. 

fivm  Elements  of  Cartography ,  2nd  ed. ,  by  Robinson ,  A  H.  ,  i960, 
published  by  John  Wiley  and  Sorts,  Inc.,  Nets  i fork  and 
reproduced  with  pemission. 
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influenced  by  whether  or  not  the  dimensions  of  the  desired  map  exceed  t’’e 
limitations  of  paper  size  arui  how  many  points  will  probably  he  used.  For 
example,  if  the  data  were  recorded  only  to  the  nearest  ten  miles,  a  large 
scale  (small  area)  map  could  contain  very  large  errors.  As  the  sire  o* 
the  area  covered  on  a  map  becomes  smaller,  the  scale  increases  and  "irso 
lution"  increases;  :onverseiv,  the  larger  the  area  covered,  the  smaller 
the  scale, and  "resolution"  decreases.  These  are,  obviously,  very  practi  l 
considerations.  For  example,  it  would  contribute  very  little  to  use  (sea?.1 
area)  local  data  on  a  map  of  the  world  unless  it  were  first  combined  with 
data  in  adjacent  areas. 

Map  projection  also  poses  seve  al  pr  jlems.  No  one  map  projection 
is  best  for  all  purposes.  Each  typ.  o  p?  jection  shows  certain  character¬ 
istics  of  the  earth  better  than  do  othe~  ro  ections,  anu  each  projection 
distorts  some  characteristics  of  the  earth':  surface.  Distortions  of  sice 
and  of  shape  are  two  major  kinds  of  effect.  Since  MOD  users  will  probably 
be  comparing  distribution  patterns,  the  best  projection  (ordinarily)  is 
one  that  minimizes  distortion  of  areas  and  shapes  (with  less  concern  for 
linear  distances,  angles,  and  directions). 

Using  these  criteria,  the  homolosine  projection  (Fig.  3-35B)  is 
probably  the  most  suitable  one  for  maps  of  the  world  as  a  whole,  however, 
two  other  projections  should  be  considered;  the  Mercator  projection  (Fig. 
3-35D)  and  the  similar-appearing  Miller  cylindrical  projection  (Fig . 3-3 SC) . 
Even  though  both  distort  area  and  shape  rather  markedly,  they  are  widely 
used  and,  thus,  familiar  projections,  furthermore,  much  data  are  already 
available,  mapped  according  to  these  projections.  Ihe  MOD  computer- 
produced  maps  in  this  report  were  al '  produced  at  vary  ing  scales,  but  with 
no  manipulations  of  LO,  LA  (i.e.,  LO  -  X  and  LA  ■  Y)  ,  anu  nn  be  considered 
as  examples  of  equirectangular  projections,  see  Fig.  3—3 5 A '  .  The  choice 
of  a  suitable  projection  for  areas  smaller  than  a  continent  does  not  pre¬ 
sent  as  great  a  problem  as  it  does  for  the  world  as  a  whole  sic.ce  all 
projections  tend  toward  the  equirectangular,  as  a  limit,  as  the  region 
mapped  becomes  smaller. 
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The  cylindrical  equal-area  projection  seems  a  good  general  compromise 
for  a  standard  MOD  projection.  It  is  the  only  equal  area  projection  with  a 
rectangular  grid  (see  Fig.  3-35E) .  While  it  looks  unusual  when  standard 
parallels  just  under  30°  are  chosen,  it  provides  the  least  mean  detormation 
of  any  equal-area  world  projection  (Robinson,  1960,  p.  73). 

3.2,7  MAP  CONSTRUCTION 

We  have  considered  in  some  detail  the  characteristics  of  maps  and 
the  methods  of  mapping.  As  we  have  illustrated,  disease  data,  once  col¬ 
lected  and  structured,  can  be  mapped  by  conventional  (manual)  methods, 
using  these  same  standard  cartographic  techniques.  Let  us  now  consider 
the  methods  by  which  computers  produce  maps. 

There  are  no  new  cartographic  principles  involved  in  the  production 
of  ua,  -  by  computer,  and  we  are  still  limited  to  three  basically  different 

ways  oi  representing  the  data;  using  dot-type  (i.e.,  data-point)  symbols, 

or  shadi.  v type  symbols,  or  contour-type  symbols.  These  three  types  of 
symbols  ca  be  inserted  on  a  map  by  computer,  using  several  different  out¬ 
put  devices  (and  these  devices  are  described  in  6.1.1): 

•  By  high-speed  (line-)  printers, 

•  By  ink-on  paper  automatic  plotters, 

•  By  cathode-rav-tube  (CRT)  devices,  displayed 
directly  for  viewing  and/or  recorded  on  film. 

The  ways  that  data  symbols  are  mechanically  inserted  Is  mainly  a 
technical  problem:  not  so  the  construction  of  data  points  anJ  the  deter¬ 
mination  of  how  manv  and  where  to  place  them.  Because  disease-environmental 

data  are  ^qualitatively)  quite  different  from  the  type  of  data  which  have 

seen  co  outer  mapped,  and  because  of  (quantitative)  limitations  in  the 
number  of  Isease-environmental  data  points  available  in  many  instances, 
i.e.,  the 1 r  spaislty ,  it  was  necessary  for  us  to  carry  out  many  preliminary 
i xercises  -  some  manually,  some  with  the  aid  of  omputers  --  to  get  a 
ci  >ar  understanding  of  the  problems. 
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Dot-type  (data-polnt)  mapping  techniques  were  found  to  be  directly 
applicable  to  computerization;  furthermore,  we  found  that  these  techniques 
can  often  be  more  easily  and  accurately  performed  by  a  computer  than  by  a 
person. 

Shading-type  mapping  techniques  cannot  be  carried  out  by  a  computer 
unless  the  boundaries  of  the  shaded  area  are  completely  defined,  or  unless 
the  entire  earth’s  surface  is  divided  into  grid  boxes  and  each  box  identi¬ 
fied.  Figure  3-12B  shows  a  successful  attempt  at  shading  in  connection 
with  the  identification  of  areas  by  their  skeletons.  This  method  vPfaltz 
and  Rosenfeld,  1967)  employs  a  grid,  but  it  is  not  required  that  all  grid 
points  be  identified,  thus  computer  storage  space  requirements  are  raini- 
c1  red. 


Figure  3-36  illustrates  a  technique  which,  though  performed  mn  'ally 
by  the  study  team,  could  be  computerized.  An  alternate  method  to  that 
used  in  producing  the  map  of  Figure  3-126  is  also  demonstrated  for  compari¬ 
son  —  one  chat  utilizes  larger  grids  to  reduce  further  the  computer 
storage  space  requirement.  Figure  3-36A  shows  the  geographic  plitlcal  unit 
boundaries  and  the  data  values  grouped  by  province  (data  taken  from  Fig. 

3-i) .  Figure  3-363  simulates  a  computer/plotter  produced  shaded  map  of 
these  data.  This  technique  uses  shading  patterns  that  can  be  represented 
entirely  within  a  single  grid  box,  of  such  a  nature  that  when  grid  Luxes 
are  combined,  the  resulting  pattern  is  smooth  and  uniform.  These  shading 
patterns  can  be  lines,  or  dots,  etc.  (see  Figs.  J—  37a  through  E;  combined 
to  form  the  overall  patterns,  as  shown  in  Figures  3—  3 7 e  through  I.  An 
alternative  technique  could  be  employed  directly  by  a  computer  and  line- 
printer  tc  produce  a  sort  of  shaded  map,  the  character  positions  of  the 
printer  would  represent  a  rectangular  grid  (with  exes  i/3  inch  by  1/10 
inch), and  the  actual  print  characters  would  represent  the  shading  symbols. 
Figure  3-36C  demonstrates  a  simulation  of  this  technique,  useng  the  same 
data.  We  wrote  a  simple  computer  program  to  demonstrate  more  effectively 
this  method  (illustrated  in  Fig.  3-36C) ,  using  the  standard  (schistosomiasis) 
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?ygi  -  <- '  6'  :.,  S?  audurd  (Venezuelan)  schistosomiasis  data  (Fig.  3-1), 

provincial  boundaries  overlaid  with  a  1/2°  X  1/2°  grid;  B,  shaded  to  simu 
late  computer/plotter  output,  based  upon  1/2°  grid;  C,  shaded  to  simulate 
computer/line-printer  output,  based  upon  1/2°  grid;  D,  shaded  to  simulate 
SYMAP  program  (Fisher  tt  al,  1967).  B,  C,  and  D  present  the  same  data 
shown  in  A.  i 
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Figure  o-37  inadlng  patterns  contained  wl'Mn  single  grid  box;  A-E, 
single  grid  boxes  containing  simple  characters  to  be  printed  or  plotted 
within  the  grid  boxes;  F-H,  single  grid  boxes  containing  characters  formed 
by  overprinting  two  or  more  of  the  simple  characters  in  A-E;  I ,  uniformly 
shaded  area  in  which  the  simple  characters  in  each  grid  box  would  combine 
to  give  the  impression  of  diagonal  rulings. 


data  set. 


Ihv  results  are  presented  in  Figures  3—15  and  3-38.  Since  the 


data  we”e  entered  on  an  equirectanguiar  grid,  the  resulting  map  is  sc  :e~ 
what  unconventional  —  an  elongated  rectangular  projection  —  but  a  useful 
result,  nevertheless.  Additional  (sch: stosomiasi  data  were  obtained  from 
he  medical  literature  to  extend  our  preliminary  exertises,  including  con¬ 
sideration  of  a  new  geographic  area.  Approximately  75  data  points  were 
plotted  on  an  existing  base  map  of  Africa,  These  data  represented  reports 
from  a  combination  of  cities,  provinces,  and  regions  derived  from  informa¬ 
tion  presented  by  Maiek  in  May,  1961.  Due  to  the  limitations  of  such  data 
(related  t.  grouping),  the  maps  that  we  have  constructed  do  not  present 
disease  situations  exactly.  As  we  have  emphasized  before,  our  primary 
objective  with  these  various  limited  disease-environmental  data  has  been 
to  explore  techniques  of  manipulating  data  to  produce  mappable  information. 
The  geographic  location  chosen  for  each  of  the  points  was  interpolated 
manually  as  each  value  was  placed  on  the  map.  then  these  data  were  manu¬ 
ally  contoured  by  members  of  ti.e  MOD  study  team.  The  results  are  shown 
as  Figure  3-39.  Figure  3-39A  was  contoured  using  conventional  cartographic 
techniques;  Figure  3-39B  was  contoured  essentially  the  same  wav,  except 
for  using  a  computer-like  method  of  interpolation.  Although  there  is  con¬ 
siderable  difference  in  overall  appearance  (note  the  smoother  flowing 
lines  in  Fig.  3-39A) ,  the  maps  show  definite  similarities  in  the  areas 
whore  data  points  are  densest.  Where  data  points  are  close  together  and 
relatively  uniformly  distributed  over  a  broad  region,  contour  mspp^ng  is 
comparatively  ’  '  '  r  rward .  ~nd  fb-  ilHnp  contour*'  ...  -  ■*" 

serious  error.  However,  where  data  points  are  few  and  far  between,  prob¬ 
lems  arise  (as  illustrated  in  3-39B)  because  it  must  be  assumed  that  the 
data  pattern  being  mapped  represents  a  smooth,  homogeneous  statistical 
surface.  Obviously,  this  is  not  necessarily  a  valid  assumption.*  Some 


*The  situation  is  improved  if  one  has  a  general  idea  of  the  nature  of  the 
region  being  mapped.  For  example,  given  spaced  elevation  point-readings 
in  a  particular  area,  one  would  use  quite  a  different  approach  to  inter¬ 
polating  if  he  knew  that  he  were  dealing  with  midtown  Manhattan  as 
compared  with  Kentucky  hill  country. 
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Figure  3- 38  Shading-type  map,  produced  on  a  line-printer  by  an  IBM  7090, 

presenting  standard  set  of  South  American  schistosomiasis  &ata  (Fig,  3-1) 
plus  oceanic  points ;  this  map  demonstrates  one  method  of  adjusting  computer 
output  (by  adding  numerous  zero-valued  data  points)  to  allow  for  natural 
features  such  as  large  bodies  of  water. 
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rule  must  be  :  ormuiated  to  prevent  contouring  among  data  f.  <  nts  so  widely 
separated  that  linear  interpolation  becomes  highly  suspect.  clterna- 

tives  to  linear  interpolation  are  quadratic,  cubic,  or  higher-degree  inter¬ 
polation,  Unfortunately,  such  methods,  in  addition  to  being  more  complex, 
usually  reouire  more  points  than  do  linear  methods. 

The  only  realistic  solution  to  the  problems  posed  by  very  sparse 
data  points  is  to  get  additional  real  data.  Any  system,  computer  or  other¬ 
wise,  which  gives  credence  to  basically  unreliable  data  does  a  great  dis¬ 
service,  and  we  have  made  every  effort  in  designing  the  MOD  system  co  see 
that  this  will  not  happen.  (For  example,  the  MOD  system  incorporates  a 
professional  evaluation  of  the  data,  a  CEN  ((Computer  Evaluation  Number)), 
a  means  of  identifying  data  in  conflict,  a  NAR  ((Narrative  Output))  to 
supplement  —  and  point  out  limitations  of  —  the  mapped  data,  etc.) 

For  production  of  contour- type  maps,  the  manual  algorithm  described 
earlier  (see  Figs. 3-17  through  3-23)  might  be  imp  ented  on  a  computer. 

If  th'c  were  achieved,  the  method  would  be  at  leas,  as  good  as  any  program 
now  existing.  (This  type  of  method  'as  not  yet  been  attempted  so  far  as 
we  know.)  F.xisting  methods  of  computer  contouring  almost  always  employ  a 
grid  (rectangular,  triangular,  hexagonal,  etc.)  for  some  aspect  of  the 
process  of  contour  mapping.  As  a  rule,  a  rectangular  grid  (usually  square) 
is  employed  since  this  permits  easy  storage  of  data  an  array.  Even 

...,*ch  compute  the  surface  statistically  resort  to  grldding  in  order 
to  display  the  results  in  map  form.  As  in  the  production  of  shadlng-type 
maps,  both  the  computer/plotter,  and  computer /line  printer  configurations 
could  produce  feasible  results. 

Once  the  necessity  cf  grid  ling  data  !  'ame  apparent,  .ho  project 
t.  im  tried  several  ways  of  grldding.  Including  circular,  triangv’ar  and 
rectangular.  One  of  our  early  experiments  utilized  the  three-point-plane 
method  suggested  by  Tobler  (1964).  Simply  stated,  it  uses  only  J  p.  three 
closest  data  points  surrounding  a  grid  point,  and  fits  a  plane  through 
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them.  Thus  the  grid  point  lies  on  this  plane  and  its  value  may  he  com¬ 
puted.  T,  e  method,  as  it  applies  to  our  system,  is  as  follows: 

(1)  Compute  the  distance  from  the  grid  point  to 
all  observed  points  (which  are  assumed  to  be 
randomly  distributed). 

(2)  Out  of  all  these  points,  find  the  nearest 
three  which  surround  the  grid  point  in 
question. 

(3)  Fit  a  plane,  Z  «  AX  +  BY  +  C,  through  the 
three  points  by  solving  the  system  of 
simultaneous  linear  equations  necessary  to 
fit  a  plane  through  points  whose  X,  Y,  and  Z 
coordinates  are  known. 

'Si)  Calculate  the  estimated  value  at  the  desired 
grid  point  by  inserting  its  coordinates 
into  the  equation. 

(5)  Continue  to  the  next  grid  point  in  the  row; 
stop  when  all  rows  have  beer  examined. 

This  method  was  performed  manually  to  produce  the  map  shown  in  Figure  3-40. 
Here,  again,  the  standard  set  of  (schistosomiasis)  data  were  used  to  pro¬ 
duce  a  contour-type  map.  All  manipulations  were  carried  out  visually 
rather  than  calculated,  but  the  concept  can  be  demonstrated  even  by  this 
rough  approximation.  A  1°  grid  was  used  for  compsriaon  with  other  methods. 

An  important  point  la  that  choice  of  grid  size  has  a  marked  influence 
~n  ih«.  map  »hi<_ii  is  ^  .  ' ne  coarser  the  grid,  the  more  the 

map  -ill  indicate  general  trends;  the  finer  the  grid,  the  greater  its  de¬ 
tail.  This  is  demonstrated  in  Figure  3-41,  which  uses  the  standard  set  of 
test  data  and  a  three-point-plane  method  of  contouring,  done  manually. 

Note  that  if  data  points  do  not  coin  de  with  grid  points  initially,  the 
net  effect  of  gridding  is  to  smooth  out  the  data.  See  Figure  3-41  on  the 
next  page. 

To  evaluate  existing  computei  contouring  systems,  the  standard 
(sculstosomiasis)  data  ret  was  submitted  to  various  existing  computer  con¬ 
touring  programs. 
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rijure  2-41  Tt. -  standard  set  of  (Venezuelan) 
schistosomiasis  data  (Fig.  3-i) ,  illustrating 
effects  of  a  coarse  (A)  versus  a  fine  (b)  grid 
during  conteiK  Ing  operations. 

The  Control  Data  Corporation  (CDC)  offers  at  its  data  centers  the  most 
sophisticated  contouring  system  generally  available.  Sufficient  control  data 
points  must  be  present  in  oi  ier  to  calculate  grid  or  ro&sh  points  on  a  rec¬ 
tangular  grid.  In  addition,  the  or'ginal  data  must  be  randomly  spaced.  thus 
accuracy  is  impioved  by  adding  points  in  areas  of  sparse  distribution.  Ihe 
CDC  program  system  pa-forms  essentially  tiiree  tasks.  The  1  i  r  s  t  task  .s  to 
add  points  to  the  data  to  improve  the  accuracy  of  the  computation  performed 
in  the  second  operation.  Assuming  the  relationship  between  two  adjacent 
points  is  linear,  a  linear  interpolation  is  made  to  add  points  at  levels 
which  do  not  exist  in  tne  original  data.  An  array  is  produced  ct  the  end  of 
this  operation,  and  is  sorted  and  prepared  for  grlddlng. 

The  second  task  la  to  calculate  grid  or  mesh  point  values  for  a  rec¬ 
tangular  grid.  Grid  point  values  are  determined  by  finding  the  neares'- 
known  data  points  which  surround  a  grid  point  and  .hen  calculating  the  grid 
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point  value  by  an  inverse  distance  function.  Grid  point  values  are  computed 
only  when  data  points  surround  an  area,  otherwise  a  "do  not  contour"  value 
is  attached  to  r'  e  grid  point. 

The  third  task  is  to  contour  automatically  the  three-dimensional  data 
thus  gridded.  This  data  is  expressed  in  X-Y  coordinates  with  a  Z  val'ie  for 
contouring.  The  control  values  are  stored  within  a  matrix  through  which  the 
program  traces,  interpolating  to  find  the  points  through  which  the  contour 
lines  pass.  The  contouring  is  performed  in  strips  of  two  adjacent  rows  of 
the  matrix.  Contouring  is  not  performed  where  "do  not  contour"  is  indicated. 
The  results  of  two  parabolic  interpolations  are  traced  to  compute  the  path 
of  each  contour  line.  As  positions  are  calculated,  plotter  commands  are 
stored  in  an  internal  array  and  output  onto  a  plotter  drive  tape  each  time 
the  array  Ls  filled.  Optionally,  the  location  of  the  data  points,  values, 
and  grid  lines  may  be  plotted. 

Our  standard  set  of  data  van  contoured  by  CDC,  utilizing  a  variety  of 
parameters.  Figure  3-42  shows  the  data  contoured  using  a  1°  grid,  for  pur¬ 
poses  of  comparison  with  other  maps  (also  because  this  was  the  level  of 
accuracy  to  which  the  original  data  was  given).  Figure  3-43  shows  the  data 
contoured  utilizing  a  much  coarser  grid  —  approximately  2°.  Here,  only 
trends  show.  The  map  shown  in  Figure  3-42  presents  some  rather  surprising 
results,  and,  as  it  stands,  does  not  give  an  adequate  picture  of  the  dis¬ 
ease  situation  (as  compared  with  Figure  3-28,  drawn  manually  from  the  same 
data  by  an  experienced  cartographer) .  The  reason  seems  to  lie  in  the  manner 
in  which  the  ?ystem  manufactures  "fill-in"  data  points.  It  can  be  seen  that 
a  relationship  exists  between  each  zero-value  point  and  each  positive-value 
point.  Venezuela  positive-data  were  combined  with  zero-value  data  through 
Colombia,  Peru,  Bolivia,  and  western  Brazil  to  create  a  false  impression. 

„  ;  i 

A  similar  false  impression  is  given  in  the  Colombia-Brazil  border  area. 
Another  set  of  data  was  also  used  in  investigating  the  CDC  program.  These 
dealt  with  rabies  in  the  eastern  United  States  and  were  produced  from 
information  supplied  to  us  by  the  National  Communicable  Disease  Center. 
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t'Ljiav  o-4 c  The  standard  set  of  schistos.miasis  data  (Fig.  3-1),  con¬ 
tour-mapped  t>v  Control  Data  Corporation  contouring  system  (using  a  CDC 
3b00  computet  and  an  offline  CalComp  plotter),  using  a  1°  grid  (continent 
outline  added  manually). 


Figure  3-44  shows  these  data  in  the  form  of  a  map  produced  with  the  CDC 
program.  Unfortunately,  the  MOD  project  was  brought  to  a  close  before  the 
implications  of  th’s  experimental  result  could  be  explored. 

The  University  of  Michigan  Geography  Department  (Tobler,  1967)  has  a 
line-printer  contouring  program  which  has  been  used  successfully  by  them 
under  a  variety  of  circumstances,  including  situations  with  both  ^args  and 
small  numbers  of  data  points.  This  program  utilizes  a  square  grid  for  con¬ 
touring.  A  simple  smoothing  technique,  called  the  moving  average,  is  ap¬ 
plied  to  the  data  to  make  trends  more  apparent.  The  selection  of  grid  size 
is  determined  automatically  by  the  computer,  which  results  in  a  coarse  grid 
when  there  are  few  points  and  a  fine  grid  when  points  are  numerous  —  based 
on  the  overall  area  and  the  total  number  of  points.  The  grid  size  selected 
also  determines  the  overall  rectangle  to  be  contoured  (which  can  be  some¬ 
what  larger  than  the  original  area) .  Grid  point  values  are  determined  by 
one  cf  two  ways:  (1),  if  the  data  point  is  less  than  an  arbitrary  distance 
away  from  the  nearest  grid  point,  its  value  is  used  for  the  grid  point  or 
(2),  if  the  data  point  is  not  sufficiently  close  to  the  grid  point,  it  is 
determined  by  a  weighted  average  between  the  closest  point  and  the  six 
closest  points  (the  closest  point  is  included  In  the  "six  closest  points"). 

Our  standard  set  of  test  data  was  contoured  by  their  program,  and  the 
result  interpreted  to  produce  the  map  shown  in  Figure  3-45,  Figure  3-46 
shows  the  computer-generated  output  map.  This  map  shows  the  effect  of  a 

large  grid  (i.e.,  it  presents  trends  only);  the  grid  size,  approximately 

o 

8',  was  selected  automatically  on  the  basis  of  total  number  of  points  and 
the  area  covered. 

Most  of  the  existing  computer  programs  for  mapping  are  designed  to 
use  a  large  number  of  data  points.  Wien  only  a  small  number  of  points  is 
available,  trends  may  be  the  only  meaningful  result  which  can  be  obtained. 
In  cases  of  this  sort,  mathematical  techniques  can  sometimes  be  applied  to 
all  the  data  points  together,  and  will  give  a  better  indication  of  trends 
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Figure  3-44  Unpublished  rabies  data  from  National  Communicable  Disease 
Center;  A,  contoured  by  a  Control  Data  Corporation  system  (equirectangular 
projection);  B,  contoured  manually  (Albers  equal-area  projection)  — 
courtesy  of  V.T.Garofalo.  As  was  the  case  with  the  schistosomiasis  maps, 
the  computer-drawn  map  indicates  only  broad  trends  in  the  data  and  is  not 
precisely  similar  to  the  manually  drawn  map. 
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Fiuux’e  5- 4b  The  »Ldimau  ov.t.  of  South  American  schistosomiasis  uata 

(Fig.  3-1)  machine-mapped  using  University  of  Michigan  contouring  program. 

Contours  in  this  figure  were  manually  traced  from  the  o,.  put  shown  in  j 

i 

figure  3-46,  on  next  page  „  I 
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The  computer  (IBM  7094) /line-printer  Lroduced  map  from 

£  45. 


Figure  3-46 

which  were  taken  the  contour  lines  shown  in  Figure 
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than  standard  mapping  techniques.  Such  three  dimensional  data  can  often  be. 
summarized  and  interpreted  by  reduction  to  a  simple  geometric  shape.  A 
geometric  surface  an  be  fitted  to  the  data  by  an  extension  of  the  least- 
squares  curve-fitting  procedure  used  in  curvilinear  correlation  and  regres¬ 
sion  analysis.  The  resulting  fitted  surface  is  an  approximation  or  "trend" 
of  the  data.  A  sixth-degree  trend  surface  requires  a  minimum  of  28  data 
points  (28  points  would  give  an  exact  fit  since  residuals  would  be  zero), 
and  we  explored  this  approach. 

A  onmnuter  program  developed  at  the  University  of  Kansas  (O'Leary, 
Lippert,  and  Spitz,  1966)  for  use  in  geological  applications,  was  used  to 
calculate  a  sixth-degree  polynomial  surface  for  the  standard  set  of  (schisto¬ 
somiasis)  data  utilized  in  these  map  studies.  The  program  was  also  used  to 
contour  the  results  on  a  line-printer.  This  program  system  first  calculates 
a  coefficient  matrix  and  column  vector,  then  solves  the  matrices,  ordering 
Lefore  each  elimination.  It  then  calculates  and  prints  trend  surface  Z 
values  (for  degree  one  through  degree  six),  residuals,  error  measures,  and 
equations  uf  surfaces.  Next,  the  trend  surfaces  requested  are  calculated 
and  printed  as  a  contour  map  using  .  a e mating  bands  of  symbols  and  blanks 
to  identify  contour  lines  and  the  regions  between  these  Lines.  If  desired, 

Z  values  (Fig.  3-10)  and  residuals  are  plotted  as  a  dot  type  map  on  the 
printer.  Figure  3-47  shows  the  interpreted  computer  output  in  a  form  simi¬ 
lar  to  the  other  results,  and  Figure  ^-a8  shows  the  comput “r  generated  out¬ 
put  map. 

This  method  should  normally  produce  acceptable  trends,  but,  in  this 
example  (Fig.  3-47),  it  fails  in  the  Feru-Colombi a-westeru  Brazil  area,  pre¬ 
sumably  due  to  the  lack  of  control  points  In  that  area.  On  this  set  of 
te  t  data  die  University  of  Michigan  program  gave  a  somewhat  better  result 
than  the  sixth-degree  polynomial  method.  In  general,  the  weighted  moving 
average  gives  results  comparable  with  those  obtained  by  the  fitting  of  local 
polynomials,  and  the  result  is  computation  lly  much  simpler.  (A  differ;. .r 
polynomial  fitted  to  these  data  points  would  have  given  a  different,  though 
not  necessarily  more  acceptable,  trend  surface.) 
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Figure  6-47  The  standard  set  of  South  American  schistosomiasis  data 
(Fig.  3-1;  conr  'ur-mapped  as  a  sixth-degree  trend  surface  by  the  Kansas 
Geological  Survey  trend-»urfa  e  program.  Contour  lines  were  traced 
manually  from  the  output  map  snovn  in  Figu  j  3-48. 
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Figure  Z-48  Computer  (IBM  7090) /line-printer  produced  map  from  which 
contour  lines  of  Figure  3-47  were  drawn.  f 
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Other  trethods  have  been  studied  to  a  limitea  extent,  but,  in  general, 
have  not  been  found  to  be  applicable.  For  example,  a  double-Fourier  series 
program  exists,  but  requires  more  points  for  accuracy  than  will  usually  be 
available;  vector  and  factor  analyses  are,  as  a  rule,  not  suitable. 

Throughout  these  experiments  several  techniques  were  evaluated,  test¬ 
ing  alternate  methods ,etc. ,  seeking  ways  to  improve  output.  For  example,  we 
tried  using  the  same  value  for  each  grid  point  falling  within  a  single  po¬ 
litical  unit.  Compare  the  base  map  shown  in  Figure  3-36A  with  Figure  3-49; 
figure  3-49A  shows  the  result  if  the  data  value  for  each  province  are  entered 
at  each  grid  point  within  the  province  before  contouring.  Figure  3-49B  is  a 
map  based  on  the  same  data,  but  this  time  the  data  points  were  roosted  at 
the  center  of  each  province.  In  general,  spreading  out  the  data  is  more  use¬ 
ful  for  shading,  whereas  averaging  the  "center  point"  data  is  more  useful  for 
contouring.  Both  methods  could  probably  be  applied  to  both  types  of  mapping, 
within  certain  limits.  For  example,  when  the  data  are  spread  out  over  a 
grid,  they  appear  to  be  more  suitable  for  shading  as  the  grid  is  made 
smaller  --  and  more  acceptable  for  contouring  as  either  the  grid  or  the  con¬ 
tour  interval  is  made  larger.  Figure  3-49A  would  be  improved  if  either  a 
coarser  grid  or  a  larger  contour  interval  were  used.  Alternatively,  center- 
point  data  are  better  for  contouring  when  the  points  are  fairly  randomly 
spread  over  the  area  to  *'e  contoured  —  and  better  for  shading  when  the 
area  boundaries  are  not  known.  Figure  3-49B  does  not  have  an  adequate  num¬ 
ber  of  points  lr.  the  lower  half  (.bated  on  the  grid  size  used),  and  the  con¬ 
tour  lines  wander  accordingly.  When  'he  extent  of  an  area  described  bv  a 
data  point  is  unknown,  the  data  value  could  be  carried  half-way  to  the  ad¬ 
jacent  data  point  for  shading  (as  shown  bv  Fig.  3-  !6D) .  This  method  is 
described  by  Harvard  'diversity  in  their  SYMAP  system  (Fisher  e_t  al.  1967) . 

When  area  boundaries  are  known,  it  is  preferable  to  spread  out  the 
data  over  a  grid  for  shading  by  computer  because  the  resulting  shaded  map 
is  reduced  to  a  dot-type  map  with,  a  very  large  number  of  dot  positions,  and 
each  dot  position  can  be  filled  with  the  appropriate  symbol.  hi  Figure  3-371, 
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Figure  3-49  The  standard  set  of  (Venezuelan)  schistosomiasis  data 
(Fig.  3-1)  contour  mapped  (compare  with  Fig.  J-36a) :  A,  with  data  for 
each  province  spread  out  over  a  1/2 C  grid;  b .  vi;.h  data  for  each  province 
grouped  at  its  (ar'-„l)  center. 
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data  points  oi  Figure  3-16,  and  (,bv  7's)  the  unknown  data /grid  points  in 
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the  simulated  shading  would  have  been  done  by  a  plotter,  a  line  in  one  box 
connecting  with  another  line  in  the  next  box  (actually,  the  entire  line 
would  be  drawn  in  one  stroke,  as  the  computer  would  test  for  the  end). 

Since  data  are  gridded  for  contouring,  and  the  grid  size  determines 
the  form  of  the  output,  we  considered  techniques  for  varying  grid  size  on 
the  same  map,  but  with  little  success.  As  an  alternative,  we  produced  sepa¬ 
rate  maps  (with  different  grid  sizes)  for  various  parts  of  the  overall  area 
and  joined  these  manually.  (Varying  °  grid  sizes  requires  manual  adjust¬ 
ment  to  line  up  the  contours  when  the  maps  are  fit  together.)  Lack  of  time 
prevented  further  investigation  of  variable  gridding  techniques,  and  several 
ideas  remain  to  be  explored.  For  example,  a  grid  file  could  be  set  up  to 
include  different-sized  grids  in  different  regions,  based  on  me4ical  workers' 
empirical  recommendations,  alternatively,  grid  size  could  be  varied,  depend¬ 
ing  on  the  number  of  data  points  and  th  ir  average  spacing. 

In  another  experiment  known  data  points  were  represented  by  appropriate 
symbols  and  all  the  unknown  grid  points  wer^.  also  represented.  The  resulting 
map  (Fig.  3-50)  is  very  difficult  to  understand  although  it  is  derived  from 
exactly  the  same  simple  data  as  depicted  in  Figure  3-18.  From  this  it  appears 
that,  as  a  rule,  unknown  points  should  not  be  indicated,  as  such  --  and  this 
conclusion  is  in  line  with  standard  cartographic  procedure. 

Information  as  to  unknown  data  points  can  be  implied  by  presenting  dot- 
type  symbols  on  a  shaded  or  contour  map.  From  the  dot-Lype  symbols  it  will 
be  quite  evident  where  known  data  were  used  and  where  interpolated  values 
'  ‘re  added.  With  this  information,  the  knowledgeable  user  can  challenge 
questionable  areas  (and,  -erhaps,  set  about  obtaining  data  for  these  areas). 

In  another  experiment  we  compared  the  results  of  a  general  or  trend 
map  (Figs.  3-28,  3-40,  3-43,  3-45,  and  3-47)  with  a  more  detailed  map  (Fig. 
3-51).  The  data  used  in  the  previous  studies  were  plotted  as  accurately 
as  possible  wi clout  grouping  and  averaging  it  by  province.  These  data  were 
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Malek  ia  Hay,  1961,  p.  305-313)  contoured  by  spreading  data  over  entire 
extent  of  each  reporting  area  rather  than  by  taking  data  at  "center-of- 
gravity"  of  each  reporting  area  or  by  grouping  all  data  into  province¬ 
sized  reporting  areas  —  A,  overall  region.  (3-51B  is  on  page  3  -  93) 
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then  gridded  and  contoured  manually.  Needless  to  say,  trends  are  not  as 
clearly  distinguishable  on  this  overall  map  (Fig.  3-51A) .  But  if  we  iook 
at  Figure  3-51B,  which  is  an  enlargement  of  one  section  of  3-51A,  it  appears 
that  this  technique  of  grouping  data  may  be  more  useful  in  small-area 
studies.  MOD  experimental  maps  produced  in  this  manner  compare  surprisingly 
well  with  published  maps  presenting  comparable  information  (Fig.  3-32). 

One  factor  that  pertains  to  contouring  and  which  places  ^ome  limita¬ 
tions  on  th  output  is  that  data  extremes  and  data  planes  cannot  be  con¬ 
toured.  Note  the  zero  contour  line  in  Figure  3-42  for  example.  The  com¬ 
puter  assumes  this  line  is  ever-present  anywhere  in  an  area  of  zero  data 
values,  and  is  quite  unsuccessful  in  contouring  it.  This  is  because  zero 
is  the  lowest  value  on  the  map  and,  in  order  to  contour,  it  is  necessary 
to  have  values  both  lower  and  higher  than  the  contour  line  to  be  drawn.  It 
might  not  be  apparent,  but  a  plane  composed  of  several  values  which  should 
fall  on  a  contour  line,  likewise,  cannot  be  reconciled  by  a  computer.  This 
is  exemplified  by  Figure  3-53,  which  represents  a  section  of  a  map  produced 
from  the  standard  MOD  test  data  by  the  Naval  Oceanographic  Office  (Osborn, 
1967).  The  grid  size  used  was  about  1  1/2°,  and  is  quite  apparent  on  the 
map.  The  1%  contour  line, labelled  101%  on  the  map,  appears  as  a  plane 
(identified  by  the  arrow)  on  which  the  computer  traces  between  all  grid 
points . 

A  common  solution  could  be  applied  to  both  these  problems.  Before 
contouring,  all  grid  point  values  could  be  offset  da^^ward  identical 
minute  amount  to  insure  that  no  value  falls  on  a  contour  line.  This  would 
make  all  values  slightly  lower  than  reported,  and  would  thereby  permit  con¬ 
struction  of  the  zero  contour  and  all  other  contours,  as  any  planes  would 
fall  between  contour  lines. 

As  a  final  study  of  gridding,  see  Figure  3-54,  where  we  have  repro¬ 
duced  figures  from  a  gridding  study  performed  by  the  University  of  Kansas 
(Preston  &  harbaugh,  1965);  Figure  3-54A  shows  a  very  detailed  topographic 


Figure  3-b2  Schistosomiasis  data  for  eastern 
Brazil;  A  (upper),  MOD-ptoduced  map  (from  data 
of  Malek  in  May,  1961,  p.  303-313);  -B  (lower), 


publisned  map  showing  comparable  data  (shaded 
areas  indicate  30%  +  prevalence)  --  from  Pathology 
in  Brazil,  Past  and  Present.  International 
Pathology,  vol.  8,  p.  8,  1967  by  Domingos  de 
Paola,  used  with  permission. 
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Figure  6-t> 3  The  standard  set  of  schistosomiasis  test  data  (Fig.  3-1) 
for  eastern  Brazil  contour-mapped  by  the  Naval  Oceanographic  Office  pro¬ 
gram.  The  arrow  points  to  an  area  in  which  (because  contoured  surface  is 
a  flat,  horizontal  plane  with  a  value  of  1%  infection  rate)  the  computer 
traced  between  all  grid  points  of  equal  value. 
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map.  Data  values  were  taken  from  this  map  at  regular  intervals  (which 
represents  the  construction  of  the  grid)  and  recontoured  manually  (Fig. 
3-54B)  to  show  the  best  representation  possibl  from  this  size  grid.  Note 
that  the  resulting  gridded  map  compares  favorably  (but  not  perfectly)  with 
the  ori  inal  topographic  map.  It  would  be  interesting  and  valuable  (and 
ultimately  necessary)  to  take  a  map  of  a  single  environmental  factor,  digit¬ 
ize  it  (by  gridding) ,  and  reconstruct  the  map  using  the  MOD  system  —  and 
this  will  be  one  of  the  crucial  tests  to  judge  whether  or  not  the  com¬ 
puterized  mapping  system  is  ready  for  use. 

In  this  consideration  of  map  construction,  we  have  concentrated  on 
problems  involving  data  points  —  dat  grouping  and  interpolation  between 
"real"  data  (groups),  particularly  in  connection  with  computer  program/plot¬ 
ter  operations  —  and  this  emphasis  was  appropriate.  We  should  not  conclude 
our  discussion,  however,  without  at  least  a  brief  consideration  of  accuracy 
requirements  ' n  computer/plotter  operations.  In  a  recent  presentation 
(1967),  M.A.  Rio;  ardson  and  J.S.  Rollett  discussed  this  problem  (and  other 
Important  ones,  e.g.,  quantities  of  point  and  line  information  required  for 
an  effective  data  bank,  etc.).  They  described  some  of  the  critical  situa¬ 
tions  (relating  to  accuracy),  including  several  of  particular  interest  to 
our  contouring  activities: 

•  Crossing  of  lines  intended  to  run  closely  side  by  side. 

•  Kinks  on  lines  and  failure  of  loops  to  close. 

•  Failure  to  pla^.e  a  point  symbol  symmetrically  on  a  line. 

it  was  their  conclusion  that  "...  the  machines  used  should  be  such 
that  not  more  than  1%  ot  errors  will  lead  to  mismatches  of  features  which 
exceed  0.006",  and  ...  this  implies  an  accuracy  of  positioning  a  given  fea¬ 
ture  of  plus  or  minus  0.003"."  It  should  be  noted  that  these  statements 
related  to  "a  computer  compatible  system  for  automatic  cartography"  capable 
of  producing  complete  ( convent ional- type)  high  quality  maps.  Furthermore, 
their  concern  was  not  so  much  in  whether  or  not  the  machine  produced  visible 
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Figure  3-54  A,  Topographic  (contour)  map  of  the  northern  part  of  the 
Lone  Star  Quadrangle,  Kansas,  showing  an  area  4.5  x  2.25  miles;  -B  (left), 
data  point  values  taken  from  A  (right)  at  intervals  of  about  0.1  mile 
according  to  a  rectangular  grid  pattern,  printed  out  on  line-printer  and 
re-contoured  manually  for  comparison  with  A,  the  original  map.  Although 
much  detail  is  lost  by  gridding  data,  the  broad , trends  are  still  quite 
evident  —  Preston  &  Harbaugh,  1965,  p.  64-65;  courtesy  of  State  Geological 
Survey  of  Kansas. 
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defects,  but,  rather,  in  how  often  these  were  likely  to  occur  and  how  much 
(human  cartographer)  effort  would  be  required  to  put  things  right,  ineir 
requirement  of  +  0.003"  accuracy  was  chosen  because  it  is  within  the  limits 
of  an  economic  load  of  hand  corrections. 

The  MOD  system,  of  course,  has  a  different  objective;  it  is  concerned 
with  production  of  maps  which,  as  a  rule,  will  present  limited  kinds  of 
data  on  one  sheet  to  be  used  ir.  conjunction  with  base  maps.  As  we  have  dis¬ 
cussed  before,  the  medical-environmental  data  are  of  more  abstract  nature 
than  roads,  mountain  peaks,  ponds,  territorial  boundaries,  etc.,  and  their 
positions  must  be  calculated.  Since,  at  best,  these  calculations  represent 
(close)  approximations,  the  degree  of  accuracy  necessary  in  MOD  system  maps 
is  not  nearly  so  great  as  that  required  by  Richardson  and  Rollett.  .j  though 
we  have  not  explored  accuracy  requirements  in  depth,  it  is  our  opinion  that 
several  plotters,  currently  available,  will  satisfy  requirements  for  the 
MOD  system  —  at  least  for  the  present.  The  weakest  link  in  the  long  chain 
which  contributes  to  inaccuracies  is  the  link  that  represents  the  raw  data. 
Next  in  order  of  importance  is  the  link  which  represents  the  method  of 
interpolation  between  real  data-points. 

3.3  BLOCK  DIAGRAMS 

Block  diagrams  are  an  offshoot  of  maps  in  that  they  attempt  to  give  a 
perspective  view  of  the  form  of  the  land  or  of  a  statistical  surface  by 
presenting  it  obliquely;  recall  that,  upon  viewing  a  map,  the  orientation 
is  perpendicular  to  the  mapped  surface.  The  term  "block,  diagram"  is  well- 
established  in  the  geologic  and  geographic  literature;  "perspective  drawing" 
is  also  used  (but  often  with  a  broader  meaning).  Block  diagrams  are  often 
called  three-dimens fonai  maps,  but  the  only  true  three-dimensional  maps  are 
"relief  maps".  Histograms  may  ilso  be  prepared  in  perspective,  attempting 
to  depict  three  dimensions.  These  are  essentially  block  diagrams  of  a  three- 
dimensional  step-function  (which  could  also  be  represented  by  shading-type 
maps,  with  the  data  based  on  area  boundaries). 
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Figures  3-55  and  3-56  show  block  diagrams  produced  to  display  non- 
disease  data.  The  standard  set  of  test  (schistosomiasis)  data  (Fig.  3-1), 
used  by  the  MOD  project  to  produce  various  types  of  maps,  is  also  shown 
here  as  block  diagrams,  first,  as  drawn  manually  (Fig.  3-57),  then,  as 
drawn  by  a  computer/plot': ?r  at  the  University  of  Michigan  (Fig.  3-58). 

The  program  which  produced  this  last-mentioned  block  diagram  (Fig. 
3-58)  is  the  only  such  program  that  was  investigated  in  the  time  available 
to  us.  Processing  of  the  data  for  the  block  diagrams  was  done  by  an  IBM 
7094  computer  ac  the  University  of  Michigan;  plotting  of  the  diagrams  was 
performed  by  an  off-line  CalComp  plotter.  This  program  successfully  repre¬ 
sented  the  standard  set  of  schistosomiasis  data  in  a  vivid  block-diagram 
format;  however,  further  investigation  v ill  be  necessary  to  test  adequately 
these  techniques. 

3.4  URAPHS 


We  have  touched  upon  the  usefulness  of  graphs,  in  general,  and  their 
possible  application  to  the  MOD  system  (3.1.3).  Let  us  now  explore  a  little 
deeper  the  idea  expressed  there  of  a  "family  of  curves",  each  curve  con¬ 
sidering  three  major  disease-environmental  factors  —  two  of  these  variable, 
the  third  held  at  a  particular  (.fixed)  level.  An  approach  of  this  sort 
could  be  of  great  value  to  the  research  epidemiologist  or  others  interested 
in  exploring  precise  relationships  among  various  causal  factors  of  a 
particular  disease.  I  he  principal  deterrent  to  this  approach  is  the  "depth" 
and  extent  (geographically)  of  the  data  that  are  required.  Among  other 
things,  it  appear  that  these  restrictions  would  almost  require  rh.it  the 
area  under  consideration  at  a  relatively  small  one.  We  consider  the  follow¬ 
ing  hypothec  leal  disease-environmental  situation  to  illustrate  the  applica¬ 
tion  and  potential  usefulness  of  the  method.  The  two  variables  considered 
in  the  graph  are:  (1)  prevalence  of  Human  leptospirosis,  and  (3)  salinity 
of  surface  water;  the  fixed  factor  is  the  pH  of  surface  water. 
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BLOCK  DIAGRAM 


Figure  6-bb  Published,  manually  drawn  block  diagram  portraying  the 
form  of  the  land's  surface  near  Madison,  Wisconsin. 

from  Elements  of  Cartography,  2nd  ed. ,  by  Robinson ,  A.  R. ,  i960, 
published  by  John  Wiley  and  Sons,  Inc.,  New  York  and 
reproduced  with  permission . 


Fi  ire  3-56  Published  block  diagram  showing  distribution  of  human 
population  In  the  United  States  in  1960;  diagram  produced  by  an  IBM  7094 
using  an  offline  ink-on-paper  CalComp  plotter,  at  the  University  of 
Michigan  Department  of  Geography. 
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Figure  A  block  diagram,  presenting  the  standard  set  of  schisto¬ 

somiasis  test  data  (Fig,  3-1),  produced  by  an  IBM  7094  using  an  off-line 
plotter,  with  the  University  of  Michigan's  Department  of  Geography  block- 
diagram  program  continent  outline  added  manually). 
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We  have  chosen  100  square  miles  as  a  reasonable  geographic  area  to  cor  - 
sider  in  this  hypothetical  situation,  and  have  (theoretically)  reliable 
data  available  fruit  most  of  the  area.  In  J8  of  the  cqucre  miles,  becaus 
of  inadequate  human  or  rat  populations,  or  insignificant  surface  ./ater, 
etc.,  appropriate  data  could  not  be  collected.  Here  is  the  result: 
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The  family  of  curves  shown  above  represents  a  hypothetical  situation 
since  the  data  were  hypothetical,  and  this  theoretical  exercise  was  simply 
to  illustrate  the  potential  usefulness  of  a  method.  If  the  graph  had  been 
based  on  real  data,  one  could  reasonably  conclude  that: 

The  number  of  rale  in  a  given  area  bears  a 
direct  relationship  to  the  prevalence  of 
human  leptospirosis,  but  this  influence  is 

!  significant  only  under  the  limiting  condi¬ 
tions:  (l)  pH  of  surface  water  is  within 
the  range  of  ?  to  10,  and  (2)  salinity  of 
surface  water  is  les3  than  12,000  mg  per 
liter. 
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Tnese  data  were  selected  atid  arranged  so  that  the  results  would  be  compatible 
with  our  general  knowledge  of  human  leptospirosis,  however ,  so  far  as  we  are 
aware,  this  knowledge  has  not  been  supported  by  data  presented  and  analyzed 
in  the  manner  that  we  have  described  here. 

It  is  known  that  leptospires  do  not  survive  for  long  (in  an  •infective 
state)  unless  surface  water  is  in  the  general  pH  range  of  7  to  10.  There  is 
also  evidence  to  suggest  that  salinity  of  surface  water  influences,  inde- 
peixiently,  survival  time  of  the  organisms,  but  the  effects  of  salinity  have 
not  been  firmly  established.  Such  a  hypothetical  as  x/e  have  described 

would  go  far  toward  resolving  the  issue.  Other  important  causal  relation¬ 
ships  might  become  evia  nt  upon  creating  additional  families  of  curves  in 
which,  for  example,  the  first  of  the  two  variables  was  prevalence  of  human 
leptospirosis,  and  the  second,  water  content  of  first  one  specific  cation, 
then  others,  in  turn  --  the  fixed  factor  being  the  pH  of  water.  (In  selec¬ 
ting  the  data  for  these  families  of  curves,  obviously,  it  would  be  important 
to  choose  geographic,  areas  where  rat  populations  Were  reasonably  comparable.) 

3.5  CONCLUSIONS 


From  our  rather  detailed  analysis  of  output  we  have  concluded  ‘ha.,  it 
is  possible  to  produce  meaningful  disease-environmental  maps  by  computer, 
and,  furthermore,  that  such  production  is  feasible.  We  have  pointed  out  the 
very  considerable  limitations  of  data,  and  have  shewn  that  many  data  cannot 
be  meaningf ully  tapped.  Furthermore,  we  have  emphasized  mat  the  computer 
system  itself  cannot  perform  an  analysis  of  the  maps  that  it  produces,  but 
that  it  cm  provide  useful  supporting  information,  in  narrative  form,  de¬ 
scribing  certain  limitations  of  the  data  and  providing  supplemental  and 
compleraental  information  of  a  type  that  cannot  be  mapped. 

Methods  of  using  disease-environmental  maps  ha/e  been  considered,  and 
we  have  described  the  advantages  of  keeping  these  rarhei  simple,  and  print¬ 
ing  them  on  transparent  sheets  so  that  they  can  be  overlaid,  one  on  another 
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(and  on  appropriate  base  maps),  and  compared, with  particular  concern  for 
coincidence  of  patterns  that  would  connect  various  factors  involved  in  the 
(multi  factorial)  cause  of  the  disease  in  question.  We  believe  that  visual 
pattern  recognition  will  continue  to  be  the  major  method  of  analyses  in 
most  instances  because  the  variation  in  data  coverage  and  the  non-quanti- 
tati.ve  nature  of  rnaay  of  the  data  will  continue  (in  the  foreseeable  tuture) 
to  -fr.terrere  seric.slv  with  i6urous  statistical  treatment  of  those  data. 

The  use  of  block  diagrams,  to  give  a  "three  dimensional"  appearance  to 
some  aspects  of  disease-environmental  factors,  has  been  described  and  illus¬ 
trated,  and  we  have  concluded  that  this  is  a  valuable  form  of  output. 

We  have  considered  the  application  of  a  particular  kind  of  graph  which, 
when  used  in  a  proper  series,  leads  to  the  production  of  a  "family  of  curves" 
that  allows  a  "three  dimensional"  approach  to  the  problem.  This  method  is, 
theoretically,  a  very  valuable  one  but,  because  of  severe  demands  or  the 
kinds  and  amounts  of  data  required  would  probably  be  most  useful  in  pro¬ 
spective  studies  that  involved  relatively  small  areas. 

Possible  extensions  cf  the  MOD  system  have  been  considered,  but  these 
considerations  have,  of  necessity,  been  exploratory.  Serious  efforts  to 
design  some  of  these  methods  of  extension  will  not  be  possible  until  the 
system  is  operational.  For  example,  mathematical  models  could  be  designed 
for  purposes  of  predicting  future  disease-environmental  relationships, 
given  proposed  changes  in  the  ecology  of  a  particular  area  (e.g.,  the  con¬ 
struction  of  a  large  dam,  or  the  elimination  of  a  forest,  or  an  extensive 
program  of  irrigation  converting  land  to  new  usage,  etc.).  It  would  also 
be  possible  to  design  mathematical  models  useful  in  predicting  changes  in 
incidence  of  a  g  n  disease  based  upon  past  behaviour  of  the  "r,rncuiar 
uisease  in  a  comparable  environmental  area.  Furthermore,  raa*.  a  of  fac¬ 
tors  could  be  prepared,  analyzing  the  extent  of  correlation  between  each 
pair  of  factors.  If  this  approach  proved  feasible,  the  computer  system 
itself  could  compare  and  analyze  data  which,  otherwise,  it  would  be  neces¬ 
sary  to  output  as  several  different,  maps.  Obviously,  such  a  capability, 

I 

I 
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even  if  it  were  very  elementary,  could  be  useful  in  determining  which  dis¬ 
ease  or  environmental  factors  were  the  most  important  ones  to  output  in 
map  form. 


*  *  * 


Finally  —  and  most  important  — •  we  have  concluded  that,  even  though 
the  MOD  project  has  been  designed  with  primary  consideration  for  output  of 
information  in  map  form,  as  an  over-all  system,  its  potential  applications 
go  well  beyond  "mapping  of  disease".  Certain  methods  which  we  have  developed 
(particularly  in  connection  with  structuring  data),  and  several  of  the 
techniques  which  we  have  devised,  are  very  pertinent  to  the  structuring  and 
manipulation  of  many  many  kinds  of  data.  We  believe  that,  properly  used, 
these  methods  would  be  very  helpful  in  converting  these  data  to  information. 
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ABSTRACT  -  Without  an  adequate  data 
base  no  computerized  system  oan 
function  effectively .  This  section 
concentrates  on  the  qualities  of 
disease-environmental  data  in  relation 
to  computer  manipulation  leading  to 
map  output.  A  method  of  structuring 
data  is  developed  that  provides 
proper  preparation  (preprocessing)  for 
computer  input.  A  factor  catalogue  is 
developed  and  types  and  characteristics 
of  data  sources  are  discussed. 


"Bad  use  of  language  (in  tht  sense  of  confusing  misuse) 
usually  leads  to  unresolved  controversies." 


Anatoi  Rapoport 
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4.0  GENERAL  CONSIDERATIONS  ; 


There  is  a  vast  amount  of  information  available  pertaining  to  the 
ecology  of  disease,  but  this  information  has  been  directed  to  such  a  variety 
of  (discipline-oriented)  recipients,  that  it  lacks  unifying  characteristics. 

Not  only  is  there  the  problem  of  jargon,  i.e.,  the  language  peculiar 
to  each  discipline,  but  there  is  a  broader  semantic  problem  in  that  the 
same  word  may  have  different  implications  when  used  by  the  geographer,  the 
geologist, the  agronomist,  the  limnologist,  the  parasitologist,  the  epi¬ 
demiologist,  the  veterinarian,  the  pathologist  .....  Furthermore,  the 
structure  of  (data  filled)  sentences  and  phrases  differs  significantly. 


One  of  the  (perhaps  the)  most  important  and  difficult  tasks  in  de¬ 
veloping  the  MOD  system  was  to  design  a  method  by  which  the  pertinent  data 
could  be  extracted  from  widely  varying  source  documents  and  structured 
(preprocessed)  so  that  they  could  be  input  into  the  computer  system  in  a 
form  suitabie  for  manipulation  that  would  yield  mapped  (anti  narrative  and 
tabular)  output. 

In  broad  terms  we  required  a  data-structuring  system  that  would  cuf 
across  disciplinary  boundries  (and  foreign  language  barriers),  allowing  us 
to  fit  various  kinds  of  data  into  a  unifying  framework  that  would  provide 
crisp  specifications  of: 


The  thing  itself 

The  quality  of  the  thing 

The  value  (measure)  of 
that  quality 


--  and  then  make 

this  qual itv /quant itv 
complex  mappuble  by 
at tachir  ;  to  it  a 
specif  it  location 
and  time  characteristic 


WHAT 

WHAT  ABOUT  IT 
HOW  MJCH 


QUALITATIVE 
JU ANT IT  AT l VE 


WHERE 

and 

WHEN 


1* 
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A  data  structuring  system  that  can  meet  these  requirements  would 
suffice  for  the  MOD  project,  but  would  have  the  capability  of  serving  in 
other  areas  too.  It  could  handle  virtually  any  sort  of  data  which  had  a 
"place"  (and  time)  distribution,  e.g.,  data  pertaining  to  a  body  or  an  organ 
or  a  tissue,  or  to  a  machine,  or  to  a  population  ("space  structured'  in 
terms  of  age  and  sex  and  occupation),  or  to  land  usage,  or  to  character  and 
distribution  of  resources,  etc.,  etc. 

Computer  technology  has  progressed  to  a  very  sophisticated  stage,  and 
Che  hardware  available  to  the  biologically  oriented  scientist  is  developed 
far  beyond  his  capacity  to  use  it  The  greatest  single  obstacle  to  the  full- 
scale  effective  use  of  what  computer  technology  has  to  offer  is  the  lack  of 
such  a  data  structuring  system  as  we  have  just  described.  Without  effective 
input  there  can  be  no  effective  storage  and  retrieval.  Storage/retrieval 
has  reached  it*  highest  level  of  (computer)  use  in  relation  to  various  ac¬ 
counting  procedures  and  in  maintaining  current  inventories  of  material,  etc. 
These  are  areas  in  which  selection  of  data  to  be  stored  poses  no  great  prob¬ 
lem,  nor  is  there  any  particular  difficulty  in  cnaracterizing  the  material 
for  ready  retrieval  since  it  is  already  in  simple,  direct  qualitative/ 
quantitative  terms. 

Since  maps  are  the  output  upon  which  the  MOD  system  concentrates,  and 
since  maps  are  ordinarily  constructed  by  selecting  a  set  of  data  points  from 
which  various  cartographic  representations  can  be  drawn,  let  us  first  define 
maps  and  data  points. 


4.1  DATA  STRUCTURING  TERM 1N0L0GY 
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(1)  MAP: 

Def inltion :  A  graphic/visual  presentation,  on  a  geographic- 
coordinate  basis,  of  the  information  imparted  by  a  particular 
set  of  specific  data  points. 

(2)  DATA  POINT : 

Def inition:  A  specific  geographic  locality  where  a  particular 
f actor/'aspect/f acet  of  the  total  disease/environmental  situ¬ 
ation  has  been  determined/observed/measured,  and  the  result/ 
evaluatic n/value  expressed  in  some  qualitative/quantitative 
f  orai. 

We  will  return  to  these  two  definitions  later,  after  discussing 
some  other  necessary  ideas . 


Note  that  three  fundamentally  different  concepts  are  implicit  In  the 
definition  of  a  data  point:  a  precise  geographic  location,  i  specific  value 
(either  a  word  or  a  number),  and  an  exact  description  of  the  disease/en¬ 
vironmental  factor  involved.  In  order  to  understand  these  three  concepts 
better,  let  us  define  and  illustrate: 

(1)  LOCATION  (LOC): 

Def lnlticn :  The  exact  geographic  position  of  the  data  point, 
stated  as  precisely  as  possible. 

Examples :  For  purposes  of  the  MOD  system,  the  LOC  of  each  data 
point  can  be  stated  in  either  of  two  ways: 

(a)  As  the  name  of  a  political  unit,  such  as: 

Pope  County,  Illinois,  U.S.A.,  North  America,  or 

Minas  Gerais  prov.,  Brazil,  South  America,  or 

Bloomington,  Monroe  County,  Indiana,  U.S.A.,  North  America. 

(b)  As  a  pair  of  numbers  (LO,  LA,  i.e,,  longitude,  latitude), 
indicating  a  point  within  a  particular  political  unit, 

su  ch  as  : 

Wo86°J3*  N39°0 7 ,  or 

W07J°or  N2J°23  * 

Although  other  methods  of  stating  geographic  locations  are  used, 
since  any  geographic  location  can  be  expressed  in  one  of  the  two 
ways  described  above,  these  two  are  the  only  ones  incorporated 
’’nto  the  MOD  system. 


a  Lata  Charaaterietiae 


(2)  VALUE  (VAL): 

Def Inltlon :  An  alphabetic  and/or  numeric  symbol  expressing 
the  precise  character/condition  of  that  aspect/f actor  (of 
the  disease/environmental  situation)  being  considered  at 
(the  LOG  of)  the  specific  data  point. 

Examples :  0.  1.  2.  3.  0.05.  0-i0.  15-35.,  or 

Absent.  Present.  Rare.  Common.  Abundant.  Shale.,  or 
Tropical  rainforest. 

(3 )  FACTOR: 

Definition:  Alphabetic  and/or  numeric  symbols  naming/describing 
exactly  what  part/aspect/facet  of  the  total  disease/environmental 
situation  is  being  evaluated  (i.e.,  given  a  VAL)  at  (the  LOC  of) 
the  specific  data  point. 

Example :  Infection  rate  of  schistosomiasis  due  to  Schistosoma 

mansoni  in  man  during  the  period  1940  to  196C. 

The  functions  of  the  three  different  parts  of  each  data  point  become 
quite  clear  when  the  process  of  making  a  map  is  considered  in  detail.  First, 
the  cartographer  selects  from  the  entire  data  pool  at  his  disposal  a  set  of 
data  points  all  of  which  involve  the  same  aspect  of  the  disease/envi ron- 
mental  situation  (i,e.,  which  all  have  the  same  factor  specified).  For 
example,  he  may  select  a  set  of  data  points  all  of  which  concern  "total 
rainfall  (inches)  during  i9t>6".  Second,  lie  plots  the  position  of  each  data 
point  on  a  geograph i c-coord inate  grid  or  base  map,  utilizing  the  locat Ion 
(LOG)  given  for  the  particular  data  point.  Third,  he  writes  the  value 
(VAL)  of  each  data  point  next  to  its  plotted  position;  for  example,  in 
mapping  "total  rainfall  (inches)  during  1966",  he  writes  the  values  "43”, 
"5b",  "17",  etc.,  next  to  the  plotted  X's  that  mark  the  locations  of  the 
data  points.  Finally,  he  examines  this  rough  map  *~'d  draws,  axound  the 
plotted  data  points,  whichever  cartographic  symbols  he  thinks  will  best 
present  the  dat^. 

To  summarize,  the  data  point's  LOCATION  (LOC)  describes  where  a 
disease/environaental  situation  was  studied.  The  FACTOR  specifies  what 
aspect  of  the  disease,-  environmental  situation  was  studied.  The  VALLE  (VAL) 
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gives  the  result  or  conclusion  reached  by  the  studies.  Thus  the  data  point 
is  the  combination  of  a  specific  location  (LOC),  a  specific  factor,  and  a 
specific  value  (VAL) . 


The  concept  of  ''factor"  requires  more  elaboration.  Any  "factor"  is 
a  disease/environmental  description  of  some  kind,  however,  several  different 
types  of  descriptions  can  be  distinguished.  For  the  purposes  of  the  MOD 
system  we  define  the  following  four  types  of  factors: 

U)  LOW-ORDER  FACTOR  (LOF): 

Def initlon:  The  most  specific  possible  name  or  description 
of  a  particular  disease/environmental  situation. 

Examples :  Occurrence.  Abundance.  Point  prevalence.  Period 
prevalence.  Incidence.  Inches.  Leptospirosis.  Schisto¬ 
somiasis.  _L.  potnona.  L.  canicola .  mansoni .  S^.  japoni  cum. 

Raccoons.  Skunks.  Foxes.  Clinical  observations.  Isolation 
from  urine.  Isolation  from  tissue.  Serologic  tests.  Mean 
total  annual  rainfall.  Maximun.  recorded  July  rainfall. 

Savanna.  Taiga.  Sewer  workers.  Rice  farmers.  Cane  cutters. 
Swineherds.  Limestone.  1961.  1900-1950. 

(2)  MIDDLE-ORDER  FACTOR  (MOF): 

Def lnltlon :  "he  set  of  all  LOF's  which  describe  the  same 
as  pect/f  acet  of  disease/environmental  situations. 

Examp  1 es : 


MOF 


Measure 


General  kind  of  disease 
Specific  disease  agent 

Animal  host  infected 
Method  of  diagnosis 


LOF's  Makinx  l’p  The  Mt)F 

0 ccurrence.  A.b u n d a;  v . 

Point  prevalence.  rcr.'-.-c 
prevalence.  Incidence . 

Inches . 

Leptospirosis.  Schistosomiasis . 

it-  DQBk'tta.  L.  .  an  i colt*. 

maRSvV.  i .  R.  .Upon*  cub. 

Raccoons.  Skunks.  Foxes. 

C linical  oh*  e rv a •*.  i on s  .  l»ol a t  ion 
from  nr . n e  Isolation  from 
tissue.  S*-rolcgic  tests. 
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Precip4  '■ations 

Vegetat ion 
Occupational  group 

Bedrock. 


Mean  total  annual  rainfall. 
Maximum  recorded  July  rainfall. 

Savanna.  Taiga. 

Sewer  workers.  Rice  Farmers. 
Cane  cutters.  Swineherds. 

Limestone.  Granite. 


(3) 


Time  period  for  which 
data  applies  | 


1960.  1961.  1962.  1963. 

1964.  1900-1950. 


specific  combination 
more  than  one  LOF . 


HIGH-ORDER  FACTOR  (HOF): 

Def in it ion :  A  specific  combination  of  LOF's  in  which  each 
LOF  belongs  to  (is  dtawn  from)  a  different  MOF,  i.e.,  a 


of  LOF's  to  which  no  MOF  contributes 


Examp  leu:  (In  these\  examples,  words  standing  alone  are 
LOF's ;  words  in  brackets  "(  ]"  are  MOF's ;  and  words  in 
italics  are  connectors.) 

HOF  if 1  ■  Incidence  [measure]  of  leptospirosis  [general 
kind  of  disease]  due  to  L,.  pomona  [specific 
disease  agent]  in  skunks  [animal  host  Infected] 
as  determined  by  isolation  from  urine  [method 
of  diagnosis]  during  1962  [time  period  for 
which  the  data  applies j . 

HOF  if 2  -  Inches  [measure]  of  mean  total  annual  rainfall 
[precipitation]  for  1963  [time  period  for  which 
the  data  applies], 

HOF  if 3  =  Abundance  [measure]  of  raccoons  [animal  host 
infected]  in  taiga  [vegetation]  on  limestone 
[bedrock]  during  1965  [time  period  for  which  the 
data  applies]. 


(4)  POLY-ORDER  FACTOR  (POF)  : 

Definition:  A  specific  combination  of  LOF's  in  which  at 
least  two  LOF's  belong  to  (are  drawn  froln)  the  same  MOF, 
i.e.,  a  specific  combination  of  LOF's  tolwhich  at  least  one 
MCF  contributes  more  than  ore  LOF.  I 

Examples :  | 

POF  If  1  «*  Period  prevalence  [measure]  of  leptospirosis 

[general  kind  of  disease]  due  tb  L.  pomona  and 
L.  canicola  [specific  disease  agent]  in  foxes 

continued  next  page 
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[animal  boat  infected]  as  detenrrlrte-d  by  clinical 
observations,  isolation  from  tissue,  and  serologic 
tests  [method  of  diagnosis]  for  1960,  1961,  1962, 

1963,  1964,  and  1966  [time  period  for  which  the  data 
applies ] . 

POF  #2  *  Inches  [measure]  of  mean  total  annual  rainfall 

[precipitation]  for  1900-1930  [time  period  for  which* 
the  data  .replies  1 

LOF's,  MQF's,  HOF's,  and  POF's,  together,  can  be  viewed  either  -.3  a 
kind  of  hierarchy  or  as  a  kind  of  matrix: 


Hierarchy 


HOl-Sl.-EOf 
*  M££ 

_  *  *  k£t£ 


a  HOF 

a  HOF 
a  POF 


There  are  two  kinds  of  MOF's: 

(1)  C-MOF’s  (common  -  MOF's),  and 

(2)  O-MOF's  (optional  -  MOF's) 


C-MOF's  are  those  MOF's  which  should,  and  usually  do,  accompany  (as 
necessary  descriptive  elements),  or  should  “  common  to  every  data  point 
(i.e.,  every  bit  of  mappahle  data),  regardless  of  what  aspect  of  the 


i 
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disease/environmental  situation  the  data  point  concerns.  However,  C-MOF's 
are  not  absolutely  essential  for  clapping  data;  a  data  point  can  be  plotted/ 
mapped  just  so  long  as  it  has  a  specific  location  (LOG),  a  specific  value 
(VAL) ,  and  at  least  some  specific  factor  (which  may  or  may  not  include  some 
or  all  of  t'-e  C-MOF's).  Because  C-MOF's  should  fit  into  the  statement  of 
the  factor  of  every  data  point,  it  is  desirable  to  differentiate  them  from 
ail  the  other  MOF's.  i.e.,  the  O-MOF's,  which  need  not  fit.  into  every 
disease /environmental  data  point  (and  which  in  a  sense  then,  are  optional). 
As  a  rule,  O-MOF's  will  be  part  of  every  mappable  data  point,  but  both  the 
particular  C-MOF's  used  and  their  number  can  vary  widely  from  one  mappable 
data  point  to  another. 

Examples  of  O-MOF's  include  moat  of  those  MOF's  given  as  examples 
under  the  definition  of  MOF's  in  general;  C-MOF's,  on  the  other  hand,  in¬ 
clude  these  and  only  these  MOF's; 

_ C-MOF _ 

(1)  Security  classification 
of  the  data. 

(2)  Primary  source  document 
identification.  (The  primary 
source  document  is  the  paper 
which  originally  reported 
the  data. ) 


(3)  Secondary  source  document 
identification.  (The 
secondary  source  document  is 
a  paper  which  references  or 
quotes  data  already  reported.) 

(4)  Professional  evaluation  of 
data  source  (i.e.,  author, 
organization,  institution, 
source  document,  etc.). 

contUi.ued  next  page 


LOF's  Making  Up  The  C-MOF _ 

Top  secret.  Secret.  Confidential. 
Restricted  —  for  official 
scientific  use  only.  Unclassified. 

Abbreviated  bibliographic  citation: 
author(s),  date,  journal/book, 
volume,  page.  (This  MOF  will 
always  be  used,  whether  the  data 
comes  from  primary  or  from  second¬ 
ary  source  documents.) 

Abbreviated  bibliographic  citation: 
author(s),  date,  journal/book, 
volume,  page.  (If  the  data  is  being 
extracted  from  its  primary  source 
document,  this  i^F  will  be  left 
blank  or  not  used.) 

More  reliable.  Less  reliable. 
Reliability  not  assessed. 

•  (Sea  pp.  4  -  34  and  -  33  for  further 
basic  discussion  of  data  structuring 
terminology . ) 
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(5)  Computer  evaluation 
of  data  point  (to  be 
calculated  internally  by 
the  computer). 

(6)  Time  period  for  which  the 
^ata  applies. 


(a  number) 


1963.  1960-1964 .  June  1965. 

Pre~1966.  17  April  1964. 


LOF's  and  MOF’s  by  themselves  cannot  (usually)  stand  alone  as  the 
complete  statement  of  the  factor  for  a  set  of  data  points  being  mappec, 
because  LOF’s  and  MOF's  do  not,  in  general,  convey  sufficient  information. 
On  the  other  hand,  HOF's  and  FOF's  can  (always)  be  meaningfully  mapped, 
each  HOF  or  POF  serving  as  the  complete  statement  of  the  facto?  being 
mapped  or  as  the  description  (title  or  legend)  for  the  Diap  of  that  factor. 


Some  idea  of  the  possible  number  of  mappabie  factors  —  HOF’s  and 

PDF’s  —  can  be  obtained  from  estimates  of  the  total  number  of  LOF's  and 

MOF’s  available  to  combine  into  HOF's  and  POF's.  We  estimate,  that  c".  the 
30 

order  of  10  different  HOF's  and  POF's  could  be  constructed.  Obviously, 
computer  handling  of  this  tremendous  number  of  possible  factors  is  not  only 
highly  desirable,  but  a  virtual  necessity. 


In  special  cases  certain  elements  can  be  treated  either  as  LOF’s  or 
as  VAL’s.  If  a  specific  element  is  treated  as  a  LOF,  that  element  wil 
appear  as  part  of  a  particular  HGF/POF.  If  the  same  element  is  treated  as 
a  VAL,  the  element  will  also  appear  as  part  of  a  HOF /POF,  but  it  will  be 
a.- different  one.  In  other  special  cases  a  HOF  can  contain  only  one  O-MOF, 
wh'ch,  in  turn,  can  contain  only  one  LOF,  Putting  it  a  different  way,  some¬ 
times  a  data-point  factor  can  consist  of  but  a  single  description  or  name. 
Such  a  factor,  in  terms  of  the  data  structure  that  we  have  discussed  in 
detail,  can  be  viewed  as  a  uni-L0F/uni-(O~)M0F/H0F,  or,  in  other  terms,  as 
a  LOF  that  is  also  an  (O-)MDF  which,  in  turn,  is  a  (mappabie)  HOF.  In  our 
experience  these  special  cases  have  been  limited  to  environmental  (not  dis¬ 
ease)  situations.  As  an  example  of  this  kind  of  special  case,  consider  a 
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data  point  where  the  vegetation  type  Is  tropical  rainforest;  the  mappabie 
data  point  could  be  either : 

In  Brazil  (LOC) ,  "common"  (VAL)  expressing  the  abundance  of 
schistosomias < «  due  to  S_.  mansonl  and  S_.  1  aponicum  in  mail 
living  in  tropical  rainforests.  The  factor  here  is  a  POF 
with  one  O-MQF  being  "vegetation  type",  and  its  contained 
contributed  LOF  being  "tropical  rainforest".) 

or 

In  Brazil  (LOC),  "tropical  rainforest"  (VAL)  expressing  the 
vegetation  type.  (The  factor  here  is  a  uni-LOF  [vegetation 
type]  uni -O-MOF  [biotic  communities  present]  HOF  that  has 
the  value,  "tropical  rainforest",  at  the  location  being 
considered. ) 

Scientific  articles  and  reports  often  yield  narrative-type  informa¬ 
tion  that  cannot  be  effectively  structured  as  part  of  a  data  point,  al¬ 
though  the  information  would  contribute  to  an  increased  understanding  of 
the  data  mapped.  In  terms  of  data  structure  we  have  defined  this  category 
as: 

NARRATIVE  (NAR) : 

Definition:  Supporting,  nonmappable  prose/narrative/textual 
information  or  data  associated  with  a  specific  data  point. 

Example:  "Water  samples  from  nearby  ponds  were  lost  en  r«"*= 
to  the  laboratory." 

To  illustrate  the  use  of  data  structured  in  t».  manner  we  have  dis- 
cu  here  are  examples  of  two  data  points  and  s  map  on  which  they  are 

presented.  (In  these  examples  words  standing  alone  ore  LOF's;  words  in 
brackets  "[  ]"  are  MOF’s;  wo*.ds  in  italics  are  connectors;  and  ’ords  in 
parenthesis  "(  )"  indicate  the  major  part  of  a  data  point.) 

Data  Point  #1:  At  (LOC:)  W076.7°  N39.0°,  Bowie,  Prince  Georges 
County,  Maryland,  U.S.A.,  North  America,  (VAL:)  1/30  expresses 
(factor,  here,  a  POF:)  period  prevalence  [measure]  of  leptospirosis 
[general  kind  of  disease]  due  to  JL.  pomona  and  _L .  canic^la  [specific 
disease  agent]  in  foxes  [animal  host  infected]  during  1960-1963 
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[time  period  for  which  data  applies]  according  to  Hopps  and 
Kappas,  1967,  personal  communication  [primary  source  document]. 

Data  Point  it 2:  At  (LOC:)  W0?7.0°  N38.9°,  District  of  Columbia, 

U.S.A.,  North  America,  (VAL:)  1/10  expresses  (factor,  here,  a 
POF:)  period  prevalence  [measure]  of  leptospirosis  [general 
kind  of  disease]  due  to  L.  poroona  and  L.  canicola  [specific 
disease  agent]  in  foxes  [animal  host  infected]  during  1960- 
1965  [time  period  for  which  data  applies]  according  to 
Richmond  and  Cuffey,  1966,  J.  Leptospirology ,  v.  39.  p,  107 
[primary  source  document];  (NAR:)  the  majority  of  foxes  ex¬ 
amined  in  this  urban  area  were  in  the  National  Zoological  Park. 

A  map  presenting  these  two  data  points  is  shown  in  Fig.  4.1. 

The  guiding  principle  throughout  this  description  of  data-structuring 
methods/techniques  of  the  MOD  system  has  been  to  relate  all  concepts  to  their 
ultimate  application  —  the  construction  of  maps  synthesized  from  specific 
sets  of  data  points.  Consequently,  attention  has  been  focused  on  the  vari¬ 
ous  component  parts  of  a  data  pcint. 


\ 

\ 

(1) 

a  location 

(LOC)  * 

DATA  I 

(2) 

a  value 

(VAL) 

f 

\ 

/ 

(3) 

a  factor 

(HOF  or 

POF ,  made  up  of  LOF  's 

POINT  1 

belonging  to  MOF's) 

(4) 

a  narrative 

(NAR)  — 

tnis  is  not  mandatory 

Parenthetically, 

as  will 

be  discussed 

later,  the 

MOD  data  processing 

system  treats  LOC,  VAL,  NAR,  and  all  MOF's  in  essentially  the  same 
manner. 
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<Fa  ctor,  here  a  POF  Period  prevalence  [measure]  (of) 
leptospirosis  [general  kind  of  disease]  (due  to)  L.  pomona  and 
L,  canlcola  f  specific  disease  agent]  (in)  foxes  [animal  host 
infected]  (during)  1960-65  [time  period  for  which  data  apply] 
(according  to)  ail  sources  [primary  source  document]: 


<JsrAF£>  *  Foxes  here  tend  to  be  scared  away  by  numerous  tourists, 
so  that  the  value  determined  may  be  low. 


Figure  4-1  Map  constructed  from  Illustrative  data  points  01  and  #2,  given 
on  preceding  page,  to  show  the  function  of  various  portions  of  the  structured 

MOD  data. 
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4.7  FACTOR  CATALOGUE 

4.2.1  DISEASE  FACTORS 

Disease  data  sought  for  input  to  the  MOD  system  car  be  structured 
according  to  the  entries  in  this  section.  As  previously  described,  the 
data  must  be  cast  in  the  form  of  distinct  data  points,  each  containing  a 
geographic  location  (LOC),  value  (VAL) ,  factor  statement,  and  supporting 
narrative  (NAR) .  The  factor  statement  is  constricted  by  combining  LOF's 
from  many  of  the  MOF's  specified  herein. 

Throughout  this  section  the  following  format  is  utilized: 

( MOF  code  designation)  —  MOF  name  or  textual  description . 

Examples  of  LOF's  (names  or  textual  descriptions)  belonging 
to  this  particular  MOF 

Explanatory  comments  regarding  this  particular  MOF 

The  precise  format  used  here  meets  MOD  data-collecting  and  uata-processing 
requirements,  as  discussed  extensively  later  in  this  report.  The  MOF's  in 
this  section  are  listed  alphabetically  by  their  3 -letter  code  designations 
so  that  this  catalogue  may  serve  as  a  guide  for  MOD  data  extractors. 

Almost  every  MOF  described  herein  can  have  as  LOF's,  "other"  and 
"unspecified/undifferentiated"  —  consequently,  these  particular  LOF's  are 
not  listed  under  each  MOF.  Furthermore,  every  MOF  can  have  "unknown"  as  a 
LOF.  Because  of  the  manner  in  which  MOD  data  processing  will  be  performed, 
this  particular  LOF  is  entered  implicitly  simply  by  omitting  the  MOF  in 
question  from  the  data-point  factor  statement,  i.e,,  not  filling  in  the 
appropriate  blanks  on  MOD  data  extraction  forms. 

The  MOD  disease  factors  are  as  follows: 


U  -  1 U 


4.  Data  Characteristics 


(AID)  Domestic  status  of  animal  infected. 

Domesticated. 

Wild  (feral). 

Wild  (native) . 

(A1P)  Precise  identity  of  animal  Infected. 

Homo  sapiens. 

Man. 

Dog. 

Fells  leoparda. 

Mlcrotus  pennoylvanlcus  (iaeadow  vole). 

Raccoons . 

Stink-pot  turtle. 

L^ars  (ursidt; .Ursidae) . 

Mice  (suborder  Myomorpha) . 

Culicirne  (culicine  mosquitoes). 

Allowance  must  be  made  for  over  2,000  LOF's,  which  will 
be  arranged  in  a  hierarchical  tree-structure . 

(AVC)  Average  course  of  disease  in  this  outbreak. 

Acute  (i.e.,  sudden  onset,  sharp  rise,  short  duration). 
Subacute. 

Sub chronic. 

Chronic  (i.e.,  gradual  onset,  gradually  progressive, 

long  or  indefinite  duration,  or  frequently  recurring). 


(AVD)  Average  duration  of  disease  in  this  outbr°3.k. 

7  days. 

153  days. 

(AVS)  Average  severity  of  disease  in  this  outbreak . 

Fatal. 

Severe  clinical. 

Moderate  clinical. 

Mild  clinical. 

Asymptomatic  /  subclinical. 

(CEN)  Computer  evaluation  of  data  point  (to  be  calculated 
automatically  by  the  MOD  computer  system;  a  C-MOF) . 
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•,Ci  ■>} _ Domestic  state  of  carrier . 

Same  as  (AID). 

;CRP )  Precise  identity  of  carrier. 
Same  as  (AIP) . 


Mo  code  designation)  Data  point  number. 

661223 RJC039. 

670815JHC173 . 

This  number  is  not  actually  a  "factor"  in  the  sense  in 
which  the  MOD  data  structure  was  defined  previously , 
but  it  is  necessary  for  MOD  data-processing  operations. 
No  code  designation  is  required  because  these  charac¬ 
ters ,  by  themselves,  will  signal  the  beginning  of  a 
new  data-point  record  within  the  MOD  system.  Each 
data  point  must  have  a  unique  designation j  this 
requirement  is  satisfied  by  entering  a  different 
number  on  each  data  extraction  form  completed .  The 
number  consists  of  the  last  two  digits  of  the  year , 
two  digits  for  the  month ,  and  two  digits  for  the  day  on 
whicn  the  data  form  was  filled  out ,  followed  by  three 
initials  of  the  data  extractor’s  name ,  then  by  three 
digits  which  indicate  that  the  data  point  is  the  nth 
point  extracted  by  that  particular  extractor  on  that 
specific  date. 


(DMS)  Disease  measure  (method  of  indicating  extent  of  disease 
within  population,  l.e..  "epidemiological  Index"). 

Occurrence  (necessitates  VAL's  of  "present/absent"). 

Abundance  (necessitates  VAL's  of  "absent/rare/common/ abundant") . 
Point  prevalence. (Dorn,  1957). 

Period  prevalence.  (Dorn,  1957). 

Incidence.  (Dorn,  1957). 

Mortality . 

Standardized  mortality  ratio.  (Howe,  1963) 

Infection  rate. 

Case  rate. 

Attack  rate. 

Hospital-admission  rate. 


(necessitate  VAL's 
which  are  ratio,  frac¬ 
tion,  or  percentage 
numbers,  e.g.,  1:4, 
1/250,  0.07,  or  13%) 


continued  next  page 
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Number  o£  cases  existing  at  specific 
point  in  time  (i.e.,  a  day)  (-CP). 

Number  of  cases  existing  at  any  time 
during  specific  time  interval  (i.e., 
a  week,  month,  or  year),  including 
both  those  which  began  before  and 
those  which  began  after  start  of 
time  interval  (-CBA) 

Number  of  cases  beginning  during 
specific  time  interval,  including 
only  those  which  began  after  start 
of  time  interval  (»  CA) . 

Number  of  deaths  occurring  during 
specific  time  interval  (■  DI) . 

The  disease  measures  requiring  ratio/ fraction/percentage  numbers 
as  VAL  's,  such  as  point  prevalence,  period  prevalence ,  incidence, 
etc.,  are  so  frequently  utilized  imprecisely  and  ambiguously  in 
the  literature  that  it  may  be  necessary  (with  some  groups  of 
data)  to  synonymize  them  simply  as  "Morbidity".  Similarly ,  the 
various  measures  which  are  stated  as  numbers  of  cases  may  also 
have  to  be  synonymized  as  simply  "Number  of  cases".  At  this 
time,  however,  the  definitions  given  by  Dorn  (1957)  will  be 
followed  for  nn-'r+  prv"'~lence  (=CP/P),  period  prevalence 
( =CBA/P) ,  incidence  (=CA/P),  and  mortaligy  ( =DI/P) ,  where  P  = 
population  examined.  Standardized  mortality  ratios  are  cal¬ 
culated  from  standard  health  statistics  by  several  arithmetic 
manipulations  as  outlined  by  Howe  (1963,  pp  3-6). 

(DOR)  Duration  of  outbreak  reported. 

6  days. 

30  days 


(necessitate  VAL's  which  are 
absolute  numbers,  e.g., 

15,  749,  or  136). 


(ESP)  Epidemiologic  state  of  disease  within  population. 

Endemic/enzootic  (i.e.,  disease  continuously  present  at  lov  rate). 

Hyperendemic/hyperenzootic  (i.e.,  disease  continuously  present 
at  high  rate). 

Sporadic  (i.r.,  disease  intermittently  present,  only  at  low  rate). 

Epidemic/epizootic  (i.e.,  disease  intermittently  present,  at  high 
rate,  in  a  small  area  —  or  disease  continuously  present  but  now 
at  ouch  higher  rate  than  usual,  in  a  small  area). 

Pandemic/panzootic  (i.e.,  disease  intermittently  present,  and  at 
high  rate,  over  very  wide  region  —  or  disease  continuously 
present  but  now  at  much  higher  rate  than  usual,  over  very  wide 
region) . 
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(.FOP;  Frequency  of  outbreaks  preceding  outbreak  reported. 

No  outbreaks  previously  reported. 

Outbreaks  rare/occasional/''°iaum. 

Outbreaks  common/frequent. 

Outbreaks  very  common/very  frequent. 

,GKD)  General  kind  of  disease. 

Leptospirosis  (»  7-day  fever  «  Weil's  disease/syndrome  ■ 
Ft.  Bragg  fever  ■  soirochetal  jaundice  «  pea-harvest 
fever  »  water  fever  «  (sugar-)cane  fever  *  rice-field 
fever  *  harvest  fever  =■  swamp  fever  *  swineherd's 
disease  ’  mud-harvest-f ield  fever  *  brushy  creek.  fev°r  * 
pretibial  fever  »  field  fever  *  cud  fever  »  autumnal 
fever) . 

Schistosomiasis . 

Rabies . 

Malaria. 

Hemorrhagic  fever. 

Dengue . 

Cholera. 


(HSC)  Type  of  human  settlement  where  outbreak  occurred. 

Urban  or  large  city. 

Suburban  area. 

Small  town. 

Densely-settled  rural  area. 

Sparsely-settled  rural  area. 

IIHD)  Domestication  of  intermediate  host . 

Same  as  tAlD) . 

(IHP)  Precise  identity  of  intermediate  host . 

Same  as  (.A IP)  . 

(I  Mi)  Prior  state  of  impunity  of  animats  inf  erred  in  this  outbreak . 

Suscept  ibile/not  imsune. 

Naturally  immune 
Artificially  ianunized . 

(KOR)  Rind  of  outbreak  reported. 

Isolated  c .1  s v  (1). 

Small  group  of  case."  (2-d9)  . 

Large  group  of  casts  (.3D+)  • 
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individuals  chosen  for  examination  or  testing,  in  the  broadest 
sense) .  j 

Random  survey  of  townspeople.  j 

Sick  people  visiting  local  physicians  for  any  illness. 

Patients  admitted  to  hospital  for  suspected  leptospirosis. 
Complete  survey  of  military  inductees. 

Complete  survey  of  dairy  herds. 

People  living  on  the  same  side  of  a  street. 

Persons  of  a  particular  occupation. 


(LDO)  Lethality  of  disease  in  this  outbreak. 

Always  fatal. 

Often  fatal. 

Seldom  fatal. 

Rarely  fatai.. 

Never  fatal. 

(LGD)  Relative  level  of  generalization  of  data.  | 

! 

Data  very  generalized  and  broad. 

Data  intermediate  in  generality. 

Data  highly  specific. 


(LOC)  Geographic  location  of  data  point  il.e,,  where  did  the  cases  occur?) 

North  America,  United  States,  Pennsylvania,  Centre  County, 

Pine  Grove  Mills. 

North  America,  United  States,  Maryland,  Anne  Arundel  County, 

W  076°  29'  N  38°  59'. 

Europe,  France,  Dordogne,  _ ,  Les  Eyzies. 

Africa,  Ghana,  Accra  Prov., _ ,  A. E. Prince's  lot 

near  Teshi. 

This  is  not  actually  a  ’'factor"  in  the  sense  in  which 
the  MOD  data  structure  was  defined  previously ,  but  it 
is  necessary  for  MOD  processing  operations.  Allowance 
must  be  made  for  over  10,000  different  entries,  entries 
which  will  be  arranged  in  a  hierarchical  tree-structure. 


(LOG)  Occupational  groups  in  largest  sample  involved. 

Sewer  workers 
Rice-field  farmers. 

Cane-cutters. 

Butchers . 

Collcge/university  students. 
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(LRK)  Racial/ethnic/breed  groups  in  largest  sample  Involved. 

Scandinavian  Europeans. 

Nilotic  Negroes. 

Italian  Americans. 

Hereford  cattle. 

Chihuahua  dogs. 

Tagalog  tribespeople . 


(LSA)  Ages  in.  largest  sample  Involved . 

Adolescent . 

Adult . 

22-28  yrs. 

0-6  mos. 

(LSX)  Sexes  in  largest  sample  involved . 

Male 
Female . 


(LS2)  Size  of  largest  sample  inv^  od  (largest  number  of  individuals 
examined,  i.e.,  how  many  individuals  were  examined  or  tested  in 
the  broadest  sense?). 

703. 

514. 

37. 

(MDC)  Method  of  diagnosis. 

Clinical  observation. 

Isolation  of  disease  agent,  source  unspecified. 

Isolation  of  disease  agent  from  water. 

Isolation  of  disease  agent  from  soil. 

Isolation  of  disease  agent  from  urine, 
isolation  of  disease  agent  from  blood. 

Isolation  of  disease  agent  from  tissue. 

Serologic  test  —  single  specimen  (or  non-rising  litre). 
Serologic,  test  —  multiple  specimens  —  rising  titre. 

Skin  test. 

Xerodlagnosis. 

Biopsy. 

Autopsy. 
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(MF'Dy  Medical  facilities  involved  in  diagnosis . 

Military'  hospital/clinic. 

University /academic  hospital/cixnic. 

Large/urban  hospical/clinic. 

Smali/rural  hospital/ciinic. 

Individual  physician. 

Nurse/paramedical  person. 

Folk/witch  doctor. 

None. 

Roving  expedition  /field  study. 


(MFT)  Medical  facilities  involved  in  treatment  during,  this  outbreak . 
Same  as  (MFD) . 

(MNP)  Minimum  duration  of  cases  in  this  outbreak . 

Same  as  (AVD) . 


(HNS)  Minimum  severity  of  disease  in  this  outbreak . 

Same  as  (AVS) , 

(MRG)  Manner  of  reporting/grouping  data. 

Data  not  grouped  —  reported  as  individual  cases. 

Data  grouped  and  reported  by  county. 

Data  grouped  and  reported  by  state/province. 

Data  grouped  and  reported  by  country /colony /dependency . 


(MIR)  Method  of  transmission  of  disease  to  animal  infected. 

Direct  contact  with  living  infected  animals. 

Direct  contact  with  dead  animal  .tissue ,  or  blood. 
Direct  contact  with  excreta  (ir eluding  urine). 
Indirect  occupational  contac.;  with  water. 

Ind-rect  recreational  contact  with  water. 

Indirect  domestic  contact  with  water. 

Inc  ect  occupational  contact  with  soil. 

Indirect  recreational  contact  with  soil. 

Indirect  domestic  contact  with  soil. 

Bite  of  carrier  or  vector. 


(MXD)  Maximum  duration  of  cases  in  this  outbreak . 
Same  ;s  (AVD) . 
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(MXS)  Maximm  severity  of  disease  in  this  outbreak. 
Same  as  (AVS) . 


(NAR)  Supporting  narrative  Information,  for  data  point . 

Water  samples  taken  during  expedition  were  contaminated 
accidentally  before  laboratory  processing. 

Observations  within  this  tract  of  jungle  led  to  a 
mapping  survey  several  months  later. 

This  is  not  actually  a  "factor"  in  this  sense  in 
which  the  t40D  data  structure  was  defined 
previously j  but  it  represents  significant 
information  that  can  be  made  available  to  the 
MOD  user. 

(PES)  Professional  evaluation  of  data  source  (l.e.,  author.  organization 
institution,  source  document,  etc,;  a  C-MOF) . 

More  reliable. 

Less  reliable. 

Reliability  not  assessed. 


(PFD)  Pattern  of  fluctuations  in  disease  situation  over  long  periods  of 
time. 

Disease  continuous,  with  no  significant  peaks  (peak  a  3  x 
average  value) . 

Disease  continuous,  but  with  significant  peaks. 

Disease  intermittent  and  seasonal,  within  a  year. 

Disease  intermittent  and  non-seasonal,  but  within  a  year. 

Disease  intermittent  and  seasonal,  but  non-yeaxly. 

Disease  intermittent  and  non-seasonal,  and  non-yeariy. 


(PEP)  Domestic  state  of  primary  host . 
Same  as  (AID) . 


(PEP)  Precise  identify  of  primary  host . 

Same  as  (AIP). 

(PSD)  Primary  source  document  identification  (abbreviated  bibliographic 
citation;  a  C-MQF ) 

Smith  &  Jones,  1968,  J.  Zool,  v.  13,  p.  51. 

Kappus,  19b8,  Leptospirosis  in  Mice,  p.  157. 
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Hopps,  1966,  Princ,  of  Path.,  pp.  13-17. 

Brown,  1951,  Geol.  Bull.,  v.  51,  pp.  571-703. 

Richmond.  15  Feb  67,  MOD  System  &  Data  Analysis  Results 
(Final  Report  from  PRC),  p.  13. 

The  pi  ivory  source  document  is  that  in  which  tne 
data  was  originally  reported. 

(RSD)  Domestic  status  of  reservoir. 

Same  as  (AID) . 

(ESP)  Precise  identity  of  reservoir. 

Same  as  (A IP) , 

(SB3)  Basis  for  sampling  for  smallest  sample  involved  (i.e.,  how  were 
the  individuals  chosen  for  examination  or  testing  in  the 
narrowest  sense?) . 

Same  as  (LBS). 

(SPA)  Specific,  disease  agent. 

Leptosp) ra.  serotype  not  determined  or  not  specified. 
Leptospira,  all  serotypes. 

Leptospira  pomona, 

Leptospira  caaicola. 

Schistosoma  mansoni. 

Plasmodium  falciparum. 

Vibrio  cholerae. 

Ozask  hemorrhagic  fever  (OHF)  -agent. 

Agents  are  to  be  arranged  in  a  hierarchical  tree- 
structui'e,  with  allowance  for  about  125  leptospiral 
agents  and  about  80  hemorrhagic-feiier  agents. 

(SEA)  Season  for  which  data  applies. 

Winter. 

Spring. 

Summer. 

Fall. 

Deer-hunting  season. 

Wet  season. 

Dry  season. 
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(TOD)  Time  of  day  for  which  data  applies. 

Morning-night. 

Dawn. 

Morning. 

Mid-day . 

Afternoon. 

Dusk. 

Evening-night. 

Mid-night. 

(TSC)  Topographic  situation  of  cases  in  outbreak. 

Along  river /valley  bottoms. 

On  plateaus. 

On  mountains,  just  below  tops. 


(UPS)  Unique  designato!  for  combination  of  smallest  and  largest 
samples  involved. 

670502RJC073 . 

680923HCH107 . 

The  matter  of  sampling  to  determine  the  value  of  a  data 
point  is  extremely  complex,  and  is  tost  explained  by 
using  a  simple  example. 

Assume  that ,  in  a  town  of  10,000  people,  1,000  men  have 
been  inducted  into  the  military  —  800  of  these  have 
been  examined  medically  in  some  way  or-  other ,  500  of 
which  have  been  examined  sero logically  for  leptospiro¬ 
sis ;  10  are  found  to  have  L.  pomona  antibodies ,  and  5 
are  found  to  have  L.  canicola  antibodies ,  and  2  are. 
determined  to  have  leptospirosis ,  clinically,  but  were 
not  tested  serologically. 

There  are  actually  three  separate  data  points  which 
must  be  extracted  separately  from  the  above  situc  ion: 

(1)  10/500  point  prevalence  of  L.  pomona  serologically . 

(2)  5/500  point  prevalence  of  L.  canicola  serologically . 

(3)  2/800  or,  possibly,  2/1000  point  prevalence  of 

L. ,  serotypes  unclertemri.ned,  diagnosed  clinically. 

Cormonlu,  when  such  studies  are  published,  one  or  moi'e 
of  the  figures.  1000,  800 ,  and  500,  are  emitted.  Ever, 
though  they  were  given ,  the  fact  that  all  these  numbers 
relate  to  essentially  the  same  sampling  situation  might 
become  obscured  if  the  data  points  were  entered  into 
the  MOD  system  separately.  To  avoid  this,  a  unique 
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designator  is  required  for  the  combination  of  j 

samples  (the  1000 ,  800 3  and  500 J  involved  in  this  i 

situation .  Suck  a  unique  designator  can  be  con- 
strutted  from  the  date  ( tw< -■  digits  for  year,  two 

digits  for  month ,  and  two  digits  for  day) ,  three  J 

letters  for  extractor's  initials,  and  three  digits  j 

indicating  that  this  was  the  nth  sample  encountered 
by  the  extractor  on  that  day. 

The  unique  designator  is  required  for  computerised 
combination  of  these-  data  points.  For  example , 
given  the  same  unique  sample  designator  for  the 
3  data  points  above ,  the  computer  system  can  then 
combine  them  correctly  to  get  IQ  +  5/500  point 
prevalence  of  L_.  serologically  (rather  than 
10  +  5/500  +  500),  and  10  *  5  +  2/800  to 
10  -h  5  +  2/1000  point  prevalence  of  L.  serologically 
and/or  clinically  (rather  than  10  +  5  +  2/500  +  800 
to  10  +  5  +  2/500  +  500  +  1000) ,  in  this  particular 
leptospirosis  situation. 

(No  code  designation)  Security  classification  of.  data  (a  C-MOF) . 

Top  secret. 

Secret. 

Confidential. 

Restricted  —  for  official  scientific  use  only. 

Unclassified. 

Ho  cod.,  designation  is  required,  because  security 
matters  must  be  'handled  manually  rather  than  by 
the  MOD  computer  system  itself. 


(SOG)  Occupational  groups  in  smallest  sample  involved . 
Same  as  (LOG; . 


(SRE)  Raclal/ethnle  groups  in  smallest  sample  involved . 
Same  as  (LRE) , 


(SSA)  Ages  in  smallest  sample  involved. 
Same  as  (LSA) . 


(SSD)  Secondary  source  document  identification  (abbreviated 
bibliographic  citation;  a  C-MOF) . 

Same  as  (PSD) . 
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The  secondary  source  document  is  one  which 
references  or  quotes  data  already  reported 

elsewhere.  j 

i- 

i 

(SSX)  Sexes  in  smallest  sample  involved. 

Same  as  (LSX) . 


(SSZ)  Size  of  smallest  sample  involved  (smallest  number  of  individuals 
examined,  i.e..  how  many  individuals  were  examined  or  tested  in 
the  narrowest  sense?) . 

Same  as  (LSZ) , 


(TGA)  Treatments  given  to  animals  infected. 

300,000  units  K  phenoxmethy.1  penicillin  q.i.d.  for  7  days. 
Nodules  surgically  removed. 


(TIM)  Time  period  for  which  the  data  applies  (i.e.,  when  did  the  cases 
occur?;  a  C-MOF ) . 

63  (i.e., 1963). 

6306,6411  (i.e.,Jun  1963  -  Nov  1964). 

630617  (i.e. ,17  Nov  1963). 

(VAL)  Value  for  data  point. 

1. 

33, 

Absent . 

Rare. 

Tropical  rainforest. 

Probabl>  present. 

This  is  not  actually  a  " factor "  in  the  sense  in  which 
the  MOD  data  structure  was  defined  previously 3  but  it 
is  necessary  for  MOD  processing  operations .  The 
particular  value  entered  for  a  data  point  trust  be 
checked  to  insure  that  it  is  compatible  with  the  factor 
statement  (particularly  the  disease  measure  —  DMS). 


(VCD)  Domestic  status  of  vector. 
Same  as  (AID). 

(VCP)  Precise  Identity  of  vector. 
Same  as  (AIP) . 
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4.2.2  ENVIRONMENTAL  FACTORS 

Environmental  data  sought  for  the  MOD  system  can  also  be  cast  in 
terms  of  the  MOD  dat  structure,  utilizing  data  points  consisting  of  LOG 
(location),  VAL  (value  ,  a  .actor  statement  (constructed  by  combining 
LOF's  from  many  of  the  MOF's  listed),  and  N  R  (narrative),  and  these 
factors  tor  environmental  data  are  the  same  as  previously  described  for 
disease  data. 

Otner  (O-'.lOF's  can  be  c  nstructed  according  to  the  same  format  as 
diseas  -related  Mf  ■' ' s .  For  example: 

(oMZ)  Biogeogra^hic  distribution  measure. 

Occurr*  .ce  (necessitates  VAL’s  "absent/present"). 

Abundance  (necessi* ates  VAL‘s  "absent/rare/common/abundant") . 

Number  of  individuals  seen  (necessitates  VAL's  such  as  "137"). 

(SMP)  Precise  identity  of  smc.ll  mammal  considered. 

Same  as  disease  MOF  (AIF). 

Because  of  the  enormous  number  of  possible  environmental  MOF's,  this 
catalogue  of  environmental  factors  lists  only  broad  statements  as  to  the 
kind  of  data  desired  rate  'r  t  ian  precise  MOF's.  However,  precise  MOF's 
can  be  readily  formilated  from  this  catalogue  when  requirements  for 
specific  environmental  data  arise.  Physical/chemical-environmental 
factors  are  listed  first,  then  biologic-environmental  factors  and,  finally, 
human-envi’-onmental  factors. 


The  MOD  environmental  factors  are  as  follows: 
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Bedrock: 

Types  (granite,  limestone,  schist,  etc.) 

Structure  (flat-lying,  folded,  jlock-faultea,  etc.) 
Chemical/mineral  content  (including  major  elements, 
trace  elements,  etc.) 

Soil: 

Types  (latosol,  sierozem,  podzol,  etc.) 
Chemical/mineral  content  (including  major  lements, 
trace  elements,  pH,  etc.) 

Temperature  (at  surface,  at  1-foot  depth,  etc.) 
Moisture  content,  porosit  ,  permeability 

Erosion: 

Severity 

Type  (sheet,  gully,  etc.) 

Topography: 

Elevat ion/altitude 
Relief 

Slopes  (abundance,  steepness,  orientation) 

Landforms  (mountain,  valley,  plain,  etc.) 


Water: 

Availability  (especially  of  potable  water) 

Types  (soil  water,  surface  water,  ground  water,  etc.) 
Physical/chemical  characteristics  (including  pH,  salinity, 
temperature,  turbidity,  hardness,  major  and  trace 
elements  —  especially  oxygen  content  and  carbonate  content^ 

Surface  water  bodies : 

Type  (intermittent  stream,  permanent  lake,  permanent  spring,  etc.. 
Origin  (natural,  artificial  —  reservoir,  irrigation  ditch,  broken 
bottle,  etc.) 

Water  movements  (flowing,  stagnant  turbulent,  etc.;  wave 
direction  and  height;  current  direction  and  speed;  tides) 
Hydrography  (depth,  gradient,  bottom  type,  e‘c.) 

Biotic  content  (aquatic  weeds,  o, ater  beds,  etc.) 

Water  pollution: 

Type  (industrial-chemical,  suspended  solids,  thermal,  sewage,  etc.) 
In tensity /severity 

Duration /frequency  (continual,  seasonal,  occasional,  etc.; 
Evaporation,  evapotranspiratlon ,  desslcation  (potential,  actual; 

Climate  types  (humid  mesothermal,  humid  continental,  etc.) 
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Weather  types 
Air  temperature' 

Where  measured  (1  inch  above  ground,  30  inches  above  ground,  etc.) 

When  measured  (noon,  6  P.M.,  any  random  time,  etc.) 

Tima  period  involved  (daily,  monthly,  seasonally,  annually,  etc.) 

How  measured  (mean,  highest  observed,  lowest  observed,  range 
of  variation,  etc.) 

Precipitation: 

How  measured  (total,  range  of  variation,  mean,  maxinum,  etc.) 

Time  period  involved  (daily,  weekly,  monthly,  annually,  etc.) 

Seasonal  distribution  (continually  wet,  distinct  wet/dry  seasons,  etc.) 
Tvpes  (rain,  snow,  sleet,  hail,  etc.) 

Dew: 

Frequency  of  formation 

Duration  into  daylight  hours  after  formation 

±  ■'os*'  • 

equency  of  formation  (number  of  frost-free  days,  etc.) 

S^.  scnal  distribution  (date  of  last  spring  killing  frost, 
t  ite  of  first  fall  killing  frost) 

Glaciers 

Humidity  (relative,  absolute,  wet-dry  bulb  temperatures,  etc.) 

3,:  rometrlc  :  ress  ire 

tin  ds  and  tQR,  and  clarlty/transparency  of  atmosphere: 

>pe  (dense  fog,  stratocutoilus,  etc.) 

Frequency  of  occurrence 

lllumln.  t ion/ ll>..it/ insolation: 

Pays  of  sunshine 

Length  of  davlight  (also  length  of  growing  season) 

Extent  to  which  people  are  exposed  to  sun  each  day 

Winds: 

Direction 

Speed  ’severity/force 
Frequency 

Season  si  distribution 

Special  types  (up-vulley  wind,  ocean  breeze,  etc.) 

Tiwnderstortu  and  lightning  (static  electricity) 
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Natural  disasters  (hurricane,  tornado,  flood,  dust--/sand-storm, 
drought  etc.) 

Air  polutlon: 

Type  (pollen,  smoke,  toxic  gases,  etc.) 

Frequency 

Severity. /in  tensity 
Gravity 

Magnetism  (terrestrial) 

Background  radiation  (ionizing) : 

Terrestrial  (frcn  uranium-bearing  black-shale  bedrock,  etc.) 

Solar 

Cosmic  -  ray 

Organisms  occurring  in  same  area  as  disease  cases  (including  wild  and 
domesticated;  including  vertebrate  animals,  irr  ertebrate  animals, 
plants,  and  protists;  including  potential  and  known  intermediate 
hosts,  accidental  hosts,  artificial  or  experimental  hosts, 
reservoirs,  carriers,  vectors,  parasites,  etc.) 

B1  ^geographic  distributions  of  such  organisms: 

Occurrence 
Relative  abundance 
Population  size  and  density 

Natural  population  cycles,  variations,  «*  migrations 
Degree  of  concentration  versus  dispe  sal 

Living  habits  of  such  organisms: 

Feeding  (including  biting  preferences,  food  chains,  etc.) 

Breeding 

Resting  (hibernating,  acstivatng,  etc.) 

Competitive  and  symbiotic  relationships 
Amount  of  contact  with  mar. 

Disease/health  conditions  of  such  organisms 

Pesticide  or  drug  resistance  among  such  organi sms 

Local  habitats  (grassland,  swamp,  desert,  forest,  cultivated  field, 
pasture,  etc.) 

Biotic  communities  (short-grass  prairie,  oak-hickorv  forest,  etc.) 
Biomes  (tropical  rainforest,  taiga,  etc.) 
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Biogeographlc  regions  (Neotropical,  Holarctie,  etc.) 

Human  population: 

Total  numbers  of  people 

Density  (number  of  people/square  mile) 

Rate  of  increase  or  decrease  of  total  population 
Birth  rate  (including  effects  of  birth  control  measures) 

Death  rate 

Pofxjlation  movements  (war  refugees,  nomads,  migratory  workers, 
patients  admitted  to  hospitals  far  from  their  residences, 
military  troop  mcveiaeuts,  etc.) 

Age  structure  of  population 

Sex  aistribution  within  population 


Racial  croups  within  population 


Ethnic  or  nationality  ^toucs  witf 


r.  lat  ion 


Socio-economic  groups  within  population  (including  caste) 


Bxood-group  di « cribution 

Distribution _ o :  other  human  hereditary  or  genetic  f actors 

Personal  medical,  hy genic,  ar.J  sanitary  practices  and  habits  (washing 
hands,  taking  sauna  baths,  protecting  newborn  infants,  etc.) 

Pup  lie  heart h  service  practices,  expenditures,  and  f acilltles : 

Source  of  pi  table  water  (surface  reservoir,  drilled  well,  etc.) 
Treatment  of  water  supply  (chloi ^nation,  filtering,  etc.) 

Treatment  oi  sewage 

General  level  of  community  sanitation 
Pest  control  and  eradication  programs 
General  vaccination  or  inoculation  programs 

Medical  facilities  avail able : 

hire  and  type  (large  hospital,  small  clinic,  mobile  eid  station,  etc. ) 
Sponsorship  and  administration  (government,  military,  missionary, 
industrial,  academic,  research  institute,  private,  etc.) 

Numbers 

Aval lability  of  pathologist  and/or  laboratory  diagnostic  service 
tst.e  of  access  to  facilities  among  diiferent  groups  within 
population  (due  to  cost,  distance,  etc.) 
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Medical  personnel  available: 

Type  (physicians,  veterinarians,  nurses,  technicians,  dressers,  etc.) 
Numbers 

General  health  level  of  popul ation 

Other  diseases  common  in  populate™  (including  genetic  defects,  infectious 
diseases,  mental  disorders,  malnutrition,  alcoholism,  drug 
addiction,  etc.) 

Nutritional  and  dietetic  habits  and  customs: 

Physical/ chemical/mineral  content  of  foodstuffs 
Nutritional  deficiencies 

Settlement  patterns  (urban,  suburban,  small  town,  dense  rural, 
sparse  rural,  etc.) 

Housing  preferences  and  habits: 

Length  of  residence  in  present ly-ocoupied  dwelling 
Construction  (windows  screened,  walls  hrick,  roof  straw,  etc.) 

Number  of  people  living  in  each  house 
Amount  of  time  spent  indoors 

Types  and  sizes  of  family  groupings 

Marriage  and  divorce  customs 

Personal  clothing  habits 

Recreational,  entertainment,  and  social  habits; 

Kinds  (swimming  at  public  beaches,  etc.) 

Frequency 

Types  involving  special  risk  of  exposure  to  disease 

(water  sports,  hiking  in  jungle,  eating  raw  fish,  etc.) 

Educational  level: 

Literacy 

Number  of  high-school,  college,  etc.  graduates 
Educational  facilities  (schools)  avaiLable 

Land  use  (gracing,  farming,  reclamation  projects,  etc.) 

Type  of  economy  (hunting-gathering,  farming,  machine  civilization,  etc.) 

Basis  of  economy: 

Hunting  or  gathering 
Fishing 

continued  next  page 
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Forestry 

Agriculture  (farming,  ranching,  etc.;  crops  involved,  e.g,, 
cotton,  wheat,  sesame,  etc.;  agricultural  practices,  such 
as  use  of  irrigation,  chemical  fertilizers,  human  nightsoil, 
etc, ) 

.Mining  (coal,  iron,  copper,  aul  diamonds,  petroleum,  etc.) 

Manufacturing  (principal  raw  materials,  principal  products) 

Services  (teaching,  research,  consulting,  etc.) 

Occupations  and  jobs: 

Types  present  (automobile  mechanic,  university  professor,  etc.) 

Relative  proportions  (i.e.,  predominant  jobs) 

Kinds  involving  special  risk  of  exposure  to  disease 

in  which  interested  (such  as  butchers,  sewer  wor1  s,  etc.) 

Economic  levels,  distribution  of  income,  standard-of -living  index, 
unemployment 

Conammi  cations  available: 

Types 

Extent  to  which  utilized 

Transportation  ave ' lab le : 

Types  (railroad,  private  car,  bus,  etc.) 

Extent  to  which  utilized 

Mobility  and  travel  pattern  of  population 

Kinds  involving  special  risk  of  exposure  to  disease 
in  which  interested  (such  as  walking  through  jungle, 
fording  streams,  etc.) 

Crime  statistics 


Military  organization  of  population  (none,  militia,  away-f rom-home 
active  d*>ty,  etc.) 

Political  movements.  political  views 

Relikions  and  rellgious/superstltiouB  customs  (such  as  pilgrimages, 
washing  in  rivers  with  other  worshippers,  etc.) 

Artistic,  musical,  and  literary  customs  and  activities 


MAPPING  OF  DISEASE 


The  MOD  method  of  data  structuring  is,  admittedly,  hard  to  grasp. 
In  an  attempt  to  simplify  explanation  of  the  basic  concept  —  and  the 
method  of  its  application  —  we  conceived  of  the  analogy  shown  in  the 
adjacent  figure. 


Figure  4~2  presents  an  orchard  that  consists  of  a  number  of 
trees;  similarly,  a  map  consists  of  (is  drawn  from)  a  number 
of  data  points,  and,  in  our  illustration,  a  single  tree  is 
analogous  to  a  data  point. 

The  location  of  the  tree  within  the  orchard  is  comparable  to 
the  location  (LOG)  of  the  data  point.  The  size  of  the  tree 
may  be  considered  comparable  to  the  value  (VAL)  of  the  data 
point. 

Carrying  our  analogy  further,  the  various  parts  of  the  tree 
can  be  compared  with  the  various  parts  of  the  factor  state¬ 
ment.  The  most  obvious,  vital,  specific  items  in  an  orchard 
(from  the  grower's  viewpoint),  are  the  fruit  on  each  tree; 
analogously,  the  most  obvious,  vital,  specific  items  in  the 
data  point's  factor  are  the  individual  LOF's.  Furthermore, 
the  branches  of  the  tree  (bearing  fruit/LOF's)  can  be  compared 
with  MOF's,  and  the  trunk  (supporting  the  f ruit-oearing 
branches/LOF-bearing  MOF's)  can  be  compared  to  a  HOF/POF. 


As  a  second  analogy,  consider  the  situation  in  ordinary  conr- 
puterized  information-processing  terminology.  Usually,  terms 
are  defined  in  two  levels  —  as  descriptors  and  as  elements . 
Elements  are  analogous  to  MOF's,  whereas  descriptors  of  those 
elements  are  equivalent  to  LOF's. 
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C-MOF’s 


both  are  «ide| 
or  supporting^  Q-MQF's 
branches  j 


[  l  imp 


LOC  *  location  of  tree  (USA,  Ind Monroe  Co.;  etc.) 

V AL  -  «ise  o/  free  (  abundant ;  14%;  e*c.) 

V - - - * 

entire  combination  =  I  data  point 

b'igux-e  4-2  Illustrating  the  ataalogy  between  MOD  data  structure  and  an 
orchaid . 
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A. 3  DATA  REQUIRED  FOR  MAPPING  —  MINIMAL  AND  OPTIMAL 

Now  chat  the  structure  of  the  data  available  to  the  MOD  system  has 
been  described,  we  can  more  cleariy  delineate  precisely  what  data  are 
necessary  for  mapping  disease-environmental  factors  —  and  what  form  they 
must  be  in.  First  and  foremost,  the  data  must  be  capable  of  being  put  into 
the  format  or  discrete  data  points;  and  this  means  that  they  must  have  a 
defined  (stated)  geographic  location  (LOG),  a  value  (VAL) .  and  some  state¬ 
ment  of  factor  (see  pages  4-6  -  4-11). 


If  data  were  collected  to  meet  a  particular  need,  e.g.,  a  specific 
MOD  system  use,  its  format  could  be  rigidly  defined  beforehand,  and  many 
problems  would  be  avoided ,  Unfortunately,  the  MOD  system  user  will,  as  a 
rule,  be  dependent  upon  data  which  he  had  no  role  in  generating  —  data  that 
were  developed  without  any  thought  of  computer  processing,  much  less  mapping. 


Although  "geographic  location"  and  "valie"  are  not  without  their 
problem*3,  thr  greeter,  difficulties  come  with  FACTOR.  In  part  this  reflects 
the  vagueness  of  our  language,  in  part  the  enormous  number  of  attributes 
which  characterize  medical  environmental  situations,  many  of  which  are  in¬ 
complete  in  themselves,  many  of  which  overlap.  It  is  factor,  mer  than  any 
other  component  v  "  the  data  point,  that  restricts  usefulness  of  the  aata. 

It  is  a  limited  specification  of  factor  in  the  original  data  source  that  is 
most  likely  to  "handicap"  the  data  point  so  that  it  is  closer  to  minimal 
than  optimal.  This  is  why  the  consideration  cf  minimal  and  optimal  data 
required  for  mapping  concentrates  on  factor. 


In  data  points  which  deal  with  certain  environmental  situations  it 
is  possible  for  the  factor  to  be  adequately  stated  by  a  single  LOF  (as 
described  earlier  In  discussing  the  terminology  of  the  MOD  data  atructur  ) . 
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Such  a  factor  is  called  a  uni-LOF  unl-(0-)MOF  HOF.  But  with  data  points 
that  deal  with  a  disease  situation,  the  factor  statement  requires  a  combi¬ 
nation  of  several  LOF's/MOF's.  Our  experience  indicates  that  LOF's  from 
Lae  following  six  MOF's  must  be  given  in  order  for  any  disease  data  point 
to  be  mapp/ible. 

(1)  Time  period  for  which  data  applies  (uhethe.r  this  is  the 
date  of  onset,  or  date  of  termination  of  disease,  or 
some  intermediate  time  during  its  course). 

(2)  Disease  measure  e.g.,  "number  of  cases  existing  during 
the  specific  time  interval". 

(3)  General  kind  of  disease,  e.g.,  "duunrhea"  or  "leptospirosis". 

(4)  Disease  agent  (as  precise  as  possible),  e.g.,  "Schistosoma 
mgnsoni "  or  "Leptospira  pomong ". 

(5)  Method  of  diagnosis,  e.g.,  "serologic"  or  "isolation  from 
urtne". 

(6)  Identity  of  animal  infected  (as  pre~  •  po  sibi_) ,  e.g., 

"Homo  sapiens^'  or  " Canis  domes  ticus" . 

These  requirements  are  not  so  absolutely  restrictive  as  they  may  seem, 
however,  the  less  specific  the  factor  statement  is  with  respect  to  any  one 
of  these  points,  the  less  useful  the  data  becomes.  To  consider  some  very 
bad  examples: 

Time  period  might  have  to  be  specified  as  "some  time 
during  1900-1^50" ;  General  kind  of  disease  as  "diarrhea"; 

Disease  agent  as  "unknown";  Method  of  diagnosis  as  clinical 
impression;  Identic/  of  animal  Infected  as  "nonhuman  .animals". 

These  six  items  represent  the  minimal  requ irements  for  a  meaningf ul 
disease-environmental  data  point.  Farther  characterization  of  the  data 
point's  factor  would  permit  the  data  point  to  be  used  in  some  of  the  more 
complicated  MOD  system  calculations.  Consequently,  if  would  be  highly 
desirable  to  include,  in  the  factor  statement,  LOF's  belonging  to  the 
following  MDF's: 
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(1) 

(2) 

(3) 

(4) 

(5) 

(6) 
(8) 

(10) 

(12) 

(14) 

(lb) 

(18) 

(19) 


Primary  source  document.  * 

Secondary  source  document.  * 

Professional  evaluation  of  data  source.  * 

Computer  evaluation  of  data  point.  * 

Unique  designator  for  combination  of  smallest  and 
largest  samples  involved. 

Size  of  smallest  and  (7)  largest  sample  involved. 

Age  group  of  smallest  and  (9)  largest  sample  involved. 

Sex  of  smallest  and  (11)  largest  sample  involved. 

Racial  or  ethnic  group  of  smallest  and  (13)  largest 
sample  involved. 

Occupational  group  of  smallest  and  (15)  largest 
sample  involved. 

Basis  for  sampling  of  smallest  and  (17)  largest 
sample  Involved. 

Domestic  state  of  (nonhuman)  animal  infected. 

Epidemiologic  state  of  disease  wlthi-'  population  cf 
animal  (including  human)  Infected. 


Many  other  disease-related  LOF's  and  MOF's  can  be  devised,  as  demon¬ 
strated  in  the  catalogue  of  MOD  system  factors  and  on  the  data  extraction 
forms  (to  be  described  later).  Any  or  all  of  these  can  be  added  to  a  data 
point  as  part  of  the  statement  of  the  data  point's  factor.  However,  the 
8 ix  "required"  MOF's  that  we  have  listed,  plus  the  LOC  and  VAL,  comprise 
the  elements  minimally  required  for  effective  processing  of  the  disease 
data  point  by  the  MOD  computerized  mapping  system.  An  optimal  data  point 
would  Include  the  nineteen  "highly  desirable"  MOF's  listed  —  and  many  more, 
selected  on  the  basis  of  known  present  need  together  with  estimates  of 
probable  future  requirements.  But  until  there  has  been  considerable  oper¬ 
ational  experience  with  the  MOD  system,  it  will  not  be  profitable  to 


*  These  are  C-MOF's  and,  as  previously  stated,  should  accompany  every  data 
point,  however,  they  are  no*:  required  in  order  for  the  data  point  to  be 
mappable . 
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speculate  what  precise  combination  of  MOV's/LOF's  maice  up  the  factor  of  the 
"optimal"  data  point. 

On  the  basis  of  rather  limited  experience  we  have  found  that  disease 
maps  were  more  readily  constructed,  and  imparted  more  information  to  the 
viewers,  if  the  data  points  had  the  following  characteristics: 

(1)  Their  values  (VAL's)  were  all  originally  giver,  in 
consistent,  uniform  format,  so  that  no  interpretative 
equating  of  dissimilar  types  of  values  had  to  be 
perf ormed. 

(/)  Their  factors  were  comparatively  short,  simple,  and 
straightforward,  rather  than  complex  and  involved. 

(This  trait  enables  a  number  of  points  to  be  compared 
with  reasonable  assurance  that  each  point  describes 
the  same  aspect  of  the  disease/environmental 
situation. ) 

(3)  Their  factors  were  oriented  toward  the  raw  data  rather 
than  toward  the  conclusions  drawn  by  auJ'ors  of  the 
papers  from  which  the  points  wer"  extracted. 

(4)  Their  locations  (LOC's)  were  distributed  fairly 
uniformly  over  a  large  geographic  region  rather  tnan 
being  clustered  in  a  few  small  spots  with  large 
distances  between  the  spots. 

Although  these  characteristics  of  data  point  used  for  mapping  are 

highly  desirable,  much  of  the  data  actually  available  to  the  MOD  effort  fall 
short  of  these  ideals  in  one  or  more  ways.  Ke  emphasize  once  again  that, 
from  the  very  beginning,  the  MOD  system  was  conceived  as  a  mechanism  to 
utilize  available  uata  as  well  as  possible,  relying  upon  the  inf . r  oed  bio- 
medically  oriented  user  to  examine  output  critically  and  with  insight  into 
its  limitations. 

There  are  three  alternatives  this  approach: 

0)  Let  "well  enough"  alone. 

(2)  Set  about  to  collect  a  great  mass  of  "Ideal1  mtdical- 
enviroumental  data  (obviously  impract icai) . 

(j)  Wait  until  "ideal"  data  appears  upon  the  scene. 
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None  of  these  alternatives  was  acceptible  to  us.  Our  view,  as  ex¬ 
pressed  in  the  Preface,  is: 

Many  of  the  most  important  problems  have  the  softest 
information ,  but  we  must  identify  what  information 
there  is,  ana  learn  its  limitations,  tie  must  work 
toward  correcting  deficiencies  in  the  data  base,  but 
even  more  important,  we  must  develop  letter  methods 
of  using  what  information  is  available . 

de  are  at  the  stage  of  world  development  whe^e  many 
important  judgments  must  r  made  in  the  absence  of 
hard  data.  If  we  do  no*  ...se  what  data  is  available , 
what  shall  we  use? 

a.  4  PROBLEM  AREAS  RELATED  10  DATA  CHARACTERISTICS 

4.4.1  DATA  STRUCTURE  I  IMITATIONS 

The  iucthod  of  structuring  data  which  we  have  described  in  detail  has 
been  used  successful  by  MOD  project  members  to  extract  disease  and  en¬ 
vironmental  data  from  a  variety  of  sources  and,  then,  to  nap  that  data. 
Furthermore,  this  structure  has  provided  the  basis  upon  which  oata  extrac¬ 
tion  forms,  data  ‘‘ties,  and  data  processing  procedures  nave  been  designed 
for  th_  MOD  system.  Put  there  are  limit ilions  to  this  method  of  structuring 
data. 

We  e:apiiasize  once  again  that  the  data  ntructur»  and  catalogue  of 
factors  are  specifically  oriented  toward  mapping.  The  MOD  system  was  never 
intended  to  oe  a  general  purpose  data-storage-and-ret rieva )  system;  rather, 
it  is  intended  to  yield  special  purpose,  disease/environmentai  maps  ac¬ 
companied  by  supplementary  information.  This  Is  why  our  method  of  struc¬ 
turing  data  has  developed  around  the  concept  of  map p able  oata  points. 

This  does  not  mean  that  our  method  of  structuring  is  1  *  act  ted  to  mapping, 
but  It  does  imply  that  modification  would  probably  be  necessary  if  the 
oiethod  were  applied  tc  other  areas. 
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Organization  of  the  data  structure  for  cotrputer  processing  permits 
data  to  be  entered  at  any  levei  of  generality  —  from  the  very  general  to 
Che  very  specific  —  and  to  be  retrieved  at  any  level  equal  to  or  more  gen¬ 
eral  than  (but  not  more  specific  than)  that  at  which  it  was  originally 
entered.  This  means  that  an  individual-patient-record  data  point  can  be 
used  in  any  retrieval,  in  contrast  to  (for  example)  a  county-average  data 

i 

point  that  could  never  be  used  to  determine  the  extent  of  disease  in  a  city 

I 

within  (and  smaller  than)  that  county.  Generalizing,  a  computer  system  can 
never  go  beyond  the  specificity  of  data  within  its  data  pool,  but  it  can 
always  combine  (miscible)  data  Into  larger  groupings.  Obviously,  then,  to 
be  most  useful,  each  data  point  entered  into  the  MOD  system  should  be  as 
specific  and  precise  as  possible  to  be  most  useful. 

4. A. 2  LOCATIONS  AND  VALUES  OF  DATA  POINTS 

Various  aspects  of  locating  data  points  were  discussed  previously 
under  Output  Analysis,  but  a  few  additional  comments  are  appropriate  to  the 
assignment  of  values  to  data  points. 

The  locations  of  data  points  input  to  the  MOD  system  will  be  stated 
in  terms  of  political  unit  names  or  in  terms  of  longitude-latitude  point 
localities  within  named  political  units.  (Data  grouped  for  a  particular 
geographical  unit  will  most  often  be  treated  is  if  they  all  existed  at  the 
center-of -gravity  of  the  area  of  the  geographical  unit.) 

j 

The  data  point  concept  which  we  have  discussed  is  universally  appli¬ 
cable  to  all  the  data  to  be  put  into  the  system.  Conversely,  if  the  data 
cannot  be  phrased  in  terms  of  discrete  data  points,  it  cannot  be  processed. 

Therefore,  if  one  wisheB  to  have  a  disease-environmental  map  that  shows  the 

| 

probable  effects  of  physiographic  features  such  as  oceans,  deserts,  and 
mountains,  he  must  do  so  by  modifying  the  particular  set  of  data  points 
used  in  constructing  the  map,  rather  than  by  modifying  the  data  structure . 
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For  example,  there  should  be  no  numan  disease  prevalence  recorded  for 
lue  (open)  oceans  or  for  (uninhabited)  mountain  peaks.  In  order  for  the 
system  to  recognize  this  situation,  the  system  dictionary  must  contain  a 
description  of  ail  such  physiographic  features.  Once  this  is  done,  then 
such  features  can  be  treated  as  desired.  For  example,  in  Fig.  3-3S,  oceans 
have  been  mapped  as  0's;  in  Fig.  3-15,  as  blank  spaces.  This  is  not  a 
unique  problem  of  computer  mapping;  the  human  cartographer  does  this  too 
(Robinscn,  1960,  p.  160),  taking  into  account  many  items  which  are  not 
strictly  a  part  of  the  data-point  set  being  mapped.  The  difference  between 
the  human  cartographer  and  the  computer  is  that  the  human  cartographer  does 
this  intuitively  whereas  the  computer  must  be  given  explicit  instructions. 

In  many  situations,  the  assignment  of  appropriate  values  to  data 
points  has  proved  extremely  difficult,  due  principally  to  the  incorrect  or 
ambiguous  usage  of  terms  relating  to  disease  measures,  samples,  and  extent 
of  diseases  in  populations.  For  example,  words  such  as  "incidence"  and 
"prevalence"  are  frequently  used  interchangeably  for  several  mathematically 
different  ratios  or  indexes  (see  Glossary  for  precise  meanings).  Caieful 
reading  of  the  data  source  document  sometimes  indicates  which  ratio  was 
meant,  but  more  often,  it  does  not.  In  this  latter  case,  a  professional 
judgment  must  be  made  by  the  data  extractor  ->s  to  what  index  was  meant. 
Similar  confusion  results  because  sizes  of  the  samples  from  which  published 
numerical  values  were  calculated  are  often  not  given.  Suppose  the  MOD  user 
requests  a  map  that  shows  infection  rate  in  terms  of  numbers  of  infected 
ind  Iv  Iduals.  To  produce  this  map  we  must  krow  both  the  percent  infection 
rate  (often  given)  and  the  number  of  individuals  examined  (often  not  given 
or,  if  given,  confused  with  the  size  of  the  total  population  from  which  the 
sample  was  drawn)  unless,  A  course,  the  number  of  infected  individuals  was 
given  in  the  data  source (s). 

We  nope  that  development  of  the  MOD  system  will  give  further  motivation 
and  stimulus  to  .  se  who  generate  and  report  bio-medical  data  to  be  precise 
and  rigorously  consistent  in  their  use  of  such  terms  as  we  have  described, 
ne-haps  standard iz ing  definitions  in  the  manner  oi  Dorn,  1957. 


Four  types  of  values  for  disease  data  points  can  be  distinguished,  and 
these  fall  into  two  major  categories: 

a.  Qualitative,  in  which  alphabetic  symbols  (or  words)  denote: 

(1)  Occurrence,  e.g.,  "present"  or  "absent", 

(2)  Abundance,  e.g.,  "common"  or  "abundant". 

b.  Quantitative ,  in  which  numeric  symbols  (or  numbers)  denote: 

(3)  Absolute  numbers,  e.g,,  "10"  infected  individuals. 

(4)  Percentages,  e.g.,  "15"  percent  infection  rate. 

Since  published  reports  of  disease  data  include  all  of  the  above  types 
of  values,  the  MOD  system  mus.  contain  algorithms  suitable  for  converting 
any  of  the  types  of  values  to  any  other.  It  would  be  impossible  to  map  to¬ 
gether  points  whose  values  were  "10  cases",  "15%  Infection  rate",  "common", 
and  "present".  All  the  values  must  be  converted  to  equivalent  values,  ex¬ 
pressed  as  only  one  of  the  possible  types  of  values.  ft>r  example,  values 
of  "rare"  could  be  converted,  either  automatically  or  under  user  specifi¬ 
cation,  to  "2%"  or  to  "5  cases"  or  compatibility  in  processing.*  Values 
stated  as  "N  cases",  with  the  scaliest  and  largest  sample  sizes  given  as 
"S  "  and  "S  ",  can  be  converted  to  percentage-type  numbers  by  the  formula: 


In  certain  situations  S  aid  S  may  be  so  small  that  the  resulting  equiva- 

o  L 

lent  Derceutage -type  valu a  for  the  data  point  will  artificially  high  if 
used  alone  on  a  map.  Such  high  values  could,  perhaps,  be  suitably  marked 


*  The  basis  for  conversion  will  vary  markedly,  depending  upon  the 
disease-environmental  situation.  For  example:  leprosy  is 
considered  to  be  "common"  in  areas  where  the  prevalence  is  2%; 
a  21  prevalence  of  dental  caries .  _u  the  other  hand,  would 
warrant  the  term  "uncommon"  —  at  least  in  many  parts  of  the 
world. 
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and  not  used  in  further  processing  so  that  the  resulting  map  does  not  dis¬ 
play  artificially  high  disease  ''peaks". 

The  problem  of  values  is  further  complicated  by  the  fact  that  type  of 
value  is  related  to  the  factor  statement  accompanying  it  —  particularly  to 
the  d’^ease  measure  stated.  Somehow,  the  two  must  be  checked  to  insure  that 
they  are  appropriate.  For  example,  in  the  MOD  system,  "common",  would  not 
be  an  appropriate  value  for  a  data  point  whose  factor  stated  that  the  point 
was  for  "number  of  cases  existing  during  specific  time  period". 

In  order  to  map  data  points  at  all,  each  data  point  must  be  assigned 
a  single ,  unique  value.  This  requirement  reflects  the  concept  of  a  map  as 
a  mathematical  surface  (X,  Y,  Z)  In  three-dimensional  space.  By  analogy 
with  topographic  maps  displaying  elevation  above  sea  level,  it  is  obvious 
that  a  particular  geographic  point  cannot  be,  simultaneously,  10  feet  and 
90  feet  above  sea  level.  In  situations  where  two  specific  disease  agents 
were  tested  for,  one  found  to  be  present  but  the  other  not,  there  is  no 
conflict  with  the  above  statement;  two  separate  data  points  must  be  made, 
one  with  the  appropriate  positive  value  and  the  other  with  a  zero  value; 

The  MOD  data  structure  allows  such  data  to  be  recorded  on  a  single  data 
extraction  form  since  the  data  inpu*-  processing  programs  can  automatically 
convert  these  data  into  one  data  point  with  the  appropriate  positive  value 
and  another  data  point  (same  LOC)  with  a  zero  value  (for  the  agent  found  to 
be  absent) . 

The  MOD  data  structure  is  capable  of  handling  both  individual-type 
(patient/case/clinicsl)  data  and  group-type  (collective/summarized)  data. 
This  capability  is  provided  by  use  of  a  value,  "1"  case,  plus  such  LOF's/ 
MOF's  as  age,  sex,  occupation,  etc.  for  an  individual  case,  and  a  value, 

"23"  cases,  plus  somewhat  different.  LOF's/MOF's  to  describe  the  character¬ 
istics  of  a  group  of  (23)  cases. 
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4.4.3  UNRELIABLE  DATA 

Source  documents  of  medical-environmental  data  vary  greatly  in 
quality.  Because  of  this,  an  estimate  of  the  reliability  of  the  data  is 
highly  desirable,  and  the  MOD  system  approaches  this  in  two  ways. 

Reliability  of  the  data  source,  i.e.,  trustworthiness,  is  termed 
"professional  evaluation  of  data  source11,  and  represents  a  value  judgment 
by  the  data  extractor  (or  analyst)  of  the  source  document's  author  and  his 
laboratory  or  institution,  as  well  as  the  document,  per  se  (experimental 
design,  methodology,  etc.).  Even  though  this  evai  ation  will  have  a  highly 
subjective  flavor,  it  will  be  much  better  than  nothing.  The  professional 
evaluation  will  probably  be  limited  to  a  statement  of  "more  reliable",  or 
"-V'ss  reliable",  or  "reliability  not  determined".  (A  more  detailed  break¬ 
down  of  reliability  by  data  extractors  has  proved  unfeasible  in  our  experi¬ 
mental  studies.)  Some  of  the  factors  that  would  be  used  by  the  data  ex¬ 
tractor  in  arriving  at  his  decision  are: 

(1)  Are  the  data  published  by  a  highly  reputable  journal? 

(2)  Is  the  report,  by  an  author  (laboratory,  institution) 
of  good  reputation? 

(3)  Is  the  experimental  design  good? 

(4)  Were  the  experiments  done  with  attention  to  detail? 

(5)  Were  the  conclusions  based  upon  a  broad  sample  (experience)? 

(6)  Are  there  contradictions  withir  the  report? 

(7)  Are  the  stated  results  completely  justified  by  the 
observations? 

(8)  Do  the  results  correspond  with  those  reported  from 
other  sources? 

(9)  Is  the  study  comprehensive? 

(10)  Do  the  references  cited  indicate  a  thorough  background 
knowledge? 

The  professional  evaluation  will  be  constant  for  all  data  points  taken  from 
a  particular  source  document.  However,  since  data  points  within  a  single 
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source  document  can  and  will  vary  greatly,  we  have  defined  another  factor 
of  reliability,  termed  "computer  evaluation  of  data  point1'.  Unlike  the 
former,  this  evaluation  will  vary  for  each  data  point  taken  from  the  docu¬ 
ment.  It  will  be  performed  by  the  computer  system  itself,  according  to  an 
algorithm  built  into  the  programs  that  take  data  into  the  system.  This 
evaluation  concerns  the  consistency  and  completeness  of  each  data  point. 

It  is  based  upon  specific  characteristics  found  within  the  data  point  —  no 
"judgment"  of  trustworthiness  is  involved.  For  example,  on e  possible  algor¬ 
ithm  would  be  to  determine  if  any  of  the  LOF's,  on  the  IDC  or  VAL  of  the 
data  point  contained  If  they  did,  the  machine  would  assign  "l~ss 

reliable"  to  the  point;  if  they  did  not,  it  would  assign  "more  reliable"  to 
the  point.  Another  possible  algorithm  would  allow  use  of  a  grading  system 
to  represent  computer  evaluation,  each  data  point  to  be  given  a  "grade" 
termed  "Computer  evaluation  number  (CEN)."  With  this  method,  each  MOF  would 
contribute  to  the  total  CEN.  For  example,  the  MOF,  "Time  period",  could 
contribute  a  maximum  of  3  points  assessed  as  follows:  if  the  LOF  were  a 

year  or  part  of  a  year,  add  2  to  the  running  total;  if  a  range  of  2-9  years, 

add  1  to  the  total;  if  a  range  of  10  or  more  years,  add  0;  if  no  Vs,  add  1; 

and  if  ?'s,  add  0.  Using  this  method  for  12  MOF's,  the  MOD  study  team 

developed  a  Computer  Evaluation  Number  for  leptospirosis  data  such  that  the 
maximum  possible  CEN  *  18,  This  leptospirosis  CEN  algorithm  was  tested  witn 
actual  data;  data  points  which  were  judged  "good"  by  all  members  of  the  study 
team  tended  to  have  CEN's  of  about  12,  while  those  judged  "bad"  tended  to 
have  CEN’s  of  about  6. 

These  two  measures  of  data  reliability  can  (and  ordinarily  will)  be 
helpful  in  the  process  of  constructing  a  map  since,  they  would  indicate 
whether  or  not  the  data  point  in  question  should  be  retrieved.  However, 
once  the  data  points  have  been  selected,  the  matter  of  "reliability"  will 
not  enter  further  into  actual  construction  of  the  map  (but  can  be  an  im¬ 
portant  component  of  NAR) .  Data  of  different  reliabilities  can  be  presented 
as  separate  output  maps  if  requested.  For  example,  dealing  with  the  same 
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medical-environmental  situation  and  same  geographic  area,  one  map  could  be 
based  upon  only  those  data  points  judged  "more  reliable",  a  second  map 
based  upon,  only  those  data  points  judged  "lees  reliable",  and  a  '■bird  map 
based  only  upon  those  data  points  characterized  (by  professional  evaluation) 
as  "reliability  not  determined".  With  the  three  maps  in  hand  the  user 
could  overlay  them  to  obtain  a  composite  map  based  upon  all  data  points. 

Then,  if  he  wished,  he  could  "subtract"  the  "less  reliable"  and/or  the 
'reliability  not  determined"  data. 

Of  course  there  are  many  limitations  of  raw  data  that  cannot  be  re¬ 
solved  in  any  way  short  of  going  back  to  the  individual  who  made  the  obser¬ 
vations  and  asking  him  to  clarify  or  amplify.  When  the  limitations  are 
clearly  evident  one  can  guard  against  misuse,  but  frequently  the  limitations 
are  not  evident  to  the  data  extractor,  nor  to  the  data  analyst,  nor  to  the 
computer.  The  likelihood  that  such  data  will  be  misused  is  an  inherent 
limitation  to  any  computerized  system,  including  the  MOD  system.  Consider, 
for  example,  a  report  stating  that  the  pH  in  a  particular  pond  was  5.2. 

(This  information  is  quite  important  in  the  ecology  of  leptospirosis  since 
the  leptospires  can  survive  ((in  an  infectious  state))  quite  well  in  neutral 
or  slightly  alkaline  water,  but  quickly  die  in  an  acid  environment.)  The 
particular  water  sample  may  have  been  taken  properly,  and  the  pH  measured 
accurately,  yet  the  data  may  be  seriously  misleading.  If  the  pond  in 
question  has  a  very  gradually  sloping  marshy  intake  side  and  a  dam  at  the 
outflow  side,  the  PH  could  vary  markedly.  Near  the  dam  (certainly  the 
easiest  site  from  which  to  take  a  water  sample)  the  pH  might  well  be  5.0 
whereas  on  the  marshy  (intake)  side  it  might  be  as  high  as  8. 

There  are  many,  many  other  aspects  of  "unreliability"  that  are  dif¬ 
ficult  to  assess.  For  example,  in  many  of  the  developing  countries  where  the 
major  part  of  the  disease  data  comes  from  sparse  medical  centers,  the  implr 
geographic  distribution  of  disease  may  be  quite  different  from  its  actual 
geographic  distribution  since  the  hospital  figures,  per  se,  do  not  reflect 
the  source  of  the  patient  population. 


U  -  UJ 


MAPPING  OF  DISEASE 


4.4.4  INCOMPLETE  DATA 


Many  highly  reliable  data  are  of  quite  limited  use  in  the  MOD  system 
because  they  are  incomplete  —  insufficiently  characterized  as  to  time  or 
location  or  factor. 


An  example  of  the  type  of  statement  frequently  found  in  pubiisnea 
papers  is:  FOUR  PERCENT  OF  CATTLE  IN  SOUTHERN  ILLINOIS  HAVE  LEPTOSPIROSIS . 
This  apparently  s^raightf orvard  statement  leaves  many  vitally  important 
questions  unanswered: 

(1)  Over  what  time  period  were  the  data  collected? 

(2)  When  were  the  data  reported? 

(3)  If  this  Is  a  conclusion  from  a  co"'posite  of  different 
studies,  are  we  certain  that  t' ere  is  no  overlapping? 

(4)  What  was  the  size  of  the  sample(s)? 

(5)  What  arc  cattle? 

all  bowids? 

a  limited  number  of  species  of  bovids? 
a  limited  number  of  breeds  within  one  species? 
just  cows? 

just  mature  animals ’ 
etc.  ? 

(6)  What  was  the  nature  of  the  sample (s)  of  "cattle"? 

sick  cattle? 

cattle  selected  because  of  the  state  health 
department's  interest  in  certain  regions? 
cattle  selected  because  of  university  studies 
being  carried  out  at  specific  chosen  (e.g 
cooperating)  fa^ms? 

(7)  Is  it  likely  that  the  prevalence  was  uniform  throughout 
southern  Illinois? 

(8)  What  are  the  precise  geographic  limits  of  "southern  Illinois"? 

(9)  What  is  "leptospirosis"? 

disease  in  terms  of: 

—  clinical  illness? 

—  detectable  antibodies? 

—  recoverable  organisms? 
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(10)  What  was  ^he  inherent  accuracy  or  the  diagnostic  procedure (s)? 

(11)  What  was  the  inherent  sensitivity  of  the  diagnostic 
procedure,  (s)? 

(12)  How  reliable  was  the  laboratory  (or  laboratories)  that 
performed  the  analyses? 

(13)  Were  the  samples  for  analyses  entirely  adequate? 

(14)  Were  the  studies  which  led  to  this  conclusion  well  planned 
(l.e.,  was  the  experimental  design  good)? 

(15)  If  this  report  i3  a  summary /ana lysis  of  a  collection,  is  it 
correct?  (i.e.,  was  there  an  error  in  transcription  1 
mathematical  manipulation  of  data?) 

(16)  Is  this  report  completely  honest  (i.e.,  was  there  intent 
to  mislead)? 

Sometimes,  in  papers  of  this  sort,  the  (professional)  data  extractor 
can  infer  answers  to  some  of  the  critical  questions,  thus  increasing  the 
usefulness/applicability  of  the  data.  Often,  however,  answers  to  the 
questions  simply  cannot  be  gleaned  from  the  report,  and  data  that  would 
otherwise  be  highly  valuable  and  widely  applicable  becomes  of  very  limited 
usefulness. * 

4.4.5  CONTRADICTORY  AN.  ERRONEOUS  DATA 

More  often  than  had  been  anticipated,  we  encountered  contradictory 
data  in  source  documents.  Sometimes  the  context  indicated  which  of  the 
two  alternatives  was  correct.  Othertimes,  however,  the  only  way  to  resolve 
the  problem  was  to  communicate  directly  with  the  author  of  the  paper. 


*  Every  effort  must  be  made  to  see  that  incompleteness  of  data  is  not 
the  fault  of  the  data  extractor,  and  we  consider  this  aspect  of  the 
problem  in  the  next  section:  5.  Data  Collection. 
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A  more  common  —  and  more  important  —  problem  comes  when  different 
data  sources  present  data  that  are  contradictory.  Which  of  the  data  are 
correct  is,  ultimately,  a  matter  of  value  judgement.*  But  the  MOD  system 
should  (and  has  been  designed  to)  recognize  contradictory  data  as  incom¬ 
patible.  There  is  only  one  practical  way  to  recognize  such  contradictory 
data;  during  the  synthesis  of  retrieved  data  necessary  tor  map  construction, 
the  MOD  system  must  check  each  retrieved  data  point  against  all  the  other 
retrieved  data  points.  If  the  LOC  and  all  the  LOF's  of  the  factor  of  the 
two  data  poir*’°  a’"e  identical  (not  just  similar),  and  if  the  VAL’s  of  the 
two  points  are  not  equivalent,  the  two  points  will  be  deemed  contradictory. 
The  system  will  either  let  both  points  stand  (to  be  combined  as  the  user 
directs,  e.g.,  by  averaging)  or  will  call  them  to  the  attention  of  the  user 
(for  selection  or  correction,  if  desired) .  Data  points  which  are  incon¬ 
sistent  with  each  other,  but  not  specifically  contradictory,  cannot  be  de¬ 
tected  directly  by  the  system.  Such  inconsistencies  could  be  found  only 
by  careful  examination  of  the  lata  file  by  a  biomedical  professional. 

Several  other  kinds  of  errors  can  be  detected  by  the  computer  system 
during  data  input  processing  and  called  to  the  attention  of  personnel 
entering  the.  data:  for  example,  data  points  containing  inappropriate  values, 
incorrect  LOF’s  in  particular  MDF's,  and  so  forth. 

4.4.6  SECONDARY  DATA  POINTS 

Published  papers  nearly  always  cite  work  done  by  other  researchers 
and,  as  previously  explained,  the  MOD  data  structure  permits  construction 
of  secondar /  date  points  —  points  derived  from  refer  es  quoted  by  the 
source  document  being  extracted.  It  is  highly  desirable  to  include  such 


*  The  MOD  system  can  help  In  this  judgment  by  providing  the 
professional  evaluation  of  data  source"  and  the  "computer 
evaluation  of  data  point". 
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secondary  data  points  for  a  variety  of  reasons:  the  quoted  data  may  be 
from  a  personal  communication  or  from  an  obscure  journal,  etc.,  or  it  may 
include  the  explanatory  comments  of  an  acknowledged  authority,  or  a  critical 
evaluation  of  inconsistancies  or  inadequacies,  and  the  like. 

One  of  the  problems  with  secondary  source  document  concerns  duplica¬ 
tion,  since  more  than  one  author  may  reference  the  same  paper.  Duplication 
might  have  run  to  30  percent  in  our  leptospirosis  reprint  file  without  care¬ 
ful  control. 

j 

Secondary  source  data  creates  other  problems  too.  It  is  often  quite 
incomplete  as  to  data.  Furthermore,  it  represents  an  incomplete  source 
document  reference.  These  are  reasons  why  secondary  source  data  should  be 
replaced  when  the  primary  document  is  extracted.  (Evaluative  or  explana¬ 
tory  comments  need  not  be  discarded.)  This  requires  that  bibliographic 
citation  be  arranged  to  allow  reference  in  updating  such  a  point  when  the 
primary  source  is  located. 

i 

4.4.7  LOCATION  TERMINOLOGY 

The  fact  that  many  relevant  papers  are  written  in  foreign  languages 
is  another  data  characteristic  that  contributes  difficulty,  especially  with 
geographic  locations.  Place  names,  e.g.,  cities  and  provinces,  are  usually 
stated  in  the  native  language  or  in  transliterated  form.  Geographic  areas, 
(e.g.,  "Jungle  region"  or  "coastal  region")  are  also  often  given  in  the 
native  language,  but  these  must  be  translated  to  become  explicit.  For 
example,  while  extracting  data  from  the  province  of  Bahia,  Brazil, ref erence 
was  made  to  "Conquista"  and  to  "Litoral"  in  a  context  that  suggested  they 
were  of  similar  nature,  but  (as  we  learned)  Conquista  is  the  name  of  a 
village,  and  litoral  means  coastal.  Comparable  problems  arise  when  the 
same  locations  are  reported  in  different  ways,  e.g.,  St,  Petersburg/ 
Leningrad,  Peking/Peiping,  Tokyo/Tokio,  etc.  Of  course,  once  recognized, 
many  of  these  can  be  treated  as  synonyms.  (Others,  e.g.,  St.  Petersburg/ 


MAPPING  UF  uISEASE 


Leningrad  are  not  strict  synonyms  since  they  relate  to  different  ((historic); 
times.)  It  is  not  only  foreign  languages  that  cause  problems  of  this  sort; 
consider  Cape  Canaveral/Cape  Kennedy. 

4.5  TYPES  AND  CHARACTFItlSTiCi  OF  DATA  SOURCES 

Four  basically  different  types  of  data  sources  are  available  to  the 
MOD  project,  these  are: 

(1)  Published  prose  summaries  (monographs,  books,  proceedings, 

1  o"  !  c  ,  technical  notes,  etc.). 

(2)  Unpub li  -hed  prose  summaries  (progress  reports,  laboratory 
reports,  letters,  and  (.(oral))  comments,  etc.). 

(i)  Unpublished  raw  data  (f  .nJ  notes,  various  completed 
date-collection  forms,  punched  cards,  and  other  items 
used  while  preparing,  but  not  included  in,  published 
papers) . 

(4)  Published  and  unpublished  maps  and  photographs. 

The  scope  of  these  data  sources  varies  immensely.  At  one  extreme  lie 
broad  surveys  of  large  regions;  at  the  other  extreme  are  detailed,  in-depth 
studies  of  small  areas  or  of  individual  cases. 

Ordinarily,  the  quality  of  published  prose  summaries  exceeds  that  of 
the  other  types  of  data,  but  this  quality  varies  greatly,  as  anvone  familiar 
with  modern  scientific  literature  knows.  But  quality  is  a  subjective  term, 
ana  what  is  good  in  one  context  may  be  poor  in  another;  "good"  papers  do 
not  necessarily  yield  good  u„ta  points  for  the  MID  system.  There  are 
several  general  characteristics  of  "good"  papers  that  help  to  explain  the 
reasons  for  this  apparent  paradox. 

First,  most  good  papers  summarize  extensive  studies.  Their  purpose  is 
to  present  (and  support)  general  conclusions  concerning  the  disease  situa¬ 
tion  in  a  geographical  area  rather  than  offer  a  mass  of  unprocessed  data 
points.  Rarely  is  all  of  the  data  give"  that  the  author  collects  !  and  on 
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which  the  paper  is  based  —  and  it  is  usually  not  possible  to  infer  what  the 
original  raw  data  was.  Ftom  previous  discussion  of  the  MOD  data  structure, 
it  is  apparent  that  raw  data  (converted  to  data  points)  is  the  mo3t  desirable 
input  to  the  MOD  system.  In  a  sense,  the  output  desired  from  the  MOD  system 
represents  a  summary  made  from  raw  data  —  and  a  good  paper  accomplishes  the 
same  thing  (thougn  not  in  the  same  form).  Another  important  reason  why  "good" 
papers  may  be  poor  sources  of  data  for  the  MOD  system  is  that  they  often 
present  data  in  a  form  that  cannot  readily  be  converted  to  the  MOD  data  point 
foriat.  And  sometimes,  although  papers  state  data  in  a  suitable  form,  they 
omit  one  or  more  of  the.  (six)  critical  items  necessary  for  the  data  to  be 
mappable.  Or  perhaps  those  data  points  which  can  be  constructed  are  too 
sparse  to  be  handled  effectively  by  current  mapping  procedures. 

A  general  defect  of  diaease  literature  is  that  it  lacks  environmental 
data  linkable  with  specific  disease  data  points.  In  recent  years  more 
authors  have  become  aware  of  the  importance  of  presenting  relevant  environ¬ 
mental  data  in  their  medical  papers;  hopefully,  this  trend  will  continue. 

But  at  this  time.  If  only  those  disease  papers  presently  in  our  files  which 
had  good  linkable  environmental  data  were  extracted,  most  of  the  disease 
data  points  necessary  tu  constructing  simple  disease  maps  would  be  lost. 

This  underscores  the  fact  that  disease  data  points  need  to  be  extracted 
even  though  unaccompanied  by  good  environmental  data.  Environmental  data 
can  be  accumulated  independently  from  other  sources.  Obviously,  this  is  not 
optimal,  but  it  _i_e_  a  means  to  bring  together  data  which  bears  on  the  same 
prohlem/ area. 

Ideally,  data  is  collected  to  meet  specific  requirements,  including 
format.  This  aspect  of  data  Is  discussed  in  detail  in  Section  5. 

Once  again  we  emphasize  that,  no  matter  how  efficient  the  computer 
manipulation  nor  how  beautifully  structured  the  output  information,  this 
information  cannot  be  better  than  the  input  data .  Reported  variation  in 
a  disease-environmental  situation  may  represent  actual  variation  in  that 
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disease-env  : .ronmentai  situation,  but  it  may  also  represent: 

•  outright  fabrication  (sometimes  politically 
inspired) 

9  incorrect  generalization  based  upon  inadequate 
data 

®  variation  in  reporting  practices 

®  variation  in  distribution  of  medical 
personnel/f acii*  ies 

•  variation  in  diagnostic  criteria  and/or 
methods 

•  a  very  transient  influx  of  persons  from  a  different 
area  (perhaps  u-arely  „o  attend  the  medical  clinic 
which  was  collecting  the  data). 

•  etc. 


Thin  section  has  been  principally  concerned  with  data  characteristics. 
Supplementing  this  general  discussion,  a  consideration  of  specific  data 
sources  —  narrative  as  well  3S  map  form  —  is  to  be  found  in  the  Appendix. 
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ABSTRACT  -  This  section  considers 
various  data  sources  and  the  ways  to 
select,  extract  and  arrange  data  from 
those  sources.  Data  extraction  pro¬ 
cedures  are  described and  several 
data  extraction  foirns  are  reproduced. 
Then ,  with  the  preprocessed  data  in 
hand,  the  methods  of  entering  these 
into  the  MOD  system  are  considered. 


"Sound  generalization  can  follow  only 
after  the  determination  of  precise 
facts.  Data  collecting  is  the  indis¬ 
pensable  means  to  synthesis." 


Hans  Zinsser 
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5.0  GENERAL  CONSIDERATIONS 

A  very  large  mass  of  potentially  useful  data  Is  available  to  us  • —  much 
more  than  we  can  hope  to  assimilate.  We  have  not  forgotten  that  the  primary 
goal  of  the  MOD  project  is  to  develop  a  computerized  disease-mapping  system 
rather  than  to  amass  a  comprehensive  collection  of  data,  but  the  system  must 
have  substance  to  work  upon.  The  computer  processing  capability  is  but  one 
side  of  the  coin;  an  adequate  data  file  base  is  the  other.  And  the  data 
that  comprises  this  base  must  be  realistic,  reflecting  not  only  actual  facts, 
but  what  kinds  (qualitative  and  quantitative)  of  facts  are  available.  Further 
more,  the  development  of  more  effective  methods  to  acquire,  s  -lect,  extrext, 
and  preprocess  the  raw  data  is  a  crucially  important  part  of  the  MOD  system. 
Thus  the  need  for  a  significant  data  collecting  effort  as  an  inherent  part 
of  the  development  of  the  MOD  system  is  clear. 

5.1  METHODS  OF  COLLECTING  DATA 


There  are  three  basic  methods  of  collecting  data: 

•  Field  collection  —  This  involves  direct  observation  anw, 
in  a  sense,  represents  the  primary  method. 

•  literature  search  —  Literature,  here,  is  used  in  a  broad 
sense  to  include  various  written  records  —  papers,  diagrams, 
maps,  etc.  —  published  or  unpublished.  In  a  se^se,  this 
represents  a  secondary  method  since  the  data  processor  is 
one  step  removed  from  the  primary  source. 

•  Combining  groups  of  data  collected  by  others  —  In  a  sense 
this  is  a  tertiary  method  since  the  data  processor  is  two 
steps  removed  from  the  primary  source. 

The  MOD  project  was  never  envisioned  to  have  a  data  generating  capacity 
and  personnel  have  not  been  available  for  field  collection,  hence  this  pri¬ 
mary  method  of  collecting  uata  was  not  used  (nowever,  in  the  subsection 
dealing  with  data  extraction  procedures,  it  will  be  seen  that,  through  the 
development  of  data  extraction  "orms,  the  MOD  system  is  certainly  involved 
in,  and  may  have  an  important  influence  on  field  collection). 
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Literature  search  was  the  most  appropriate  method  for  collecting  data 
to  be  used  in  the  MOD  system,  and  most  of  our  data  collecting  efforts  have 
been  expended  here.  There  were  two  principal  reasons  for  this:  first, 
literature  search  was  the  simplest  and  least  expensive  method;  second,  and 
more  important,  the  MOD  system  is  designed  to  process  available  data,  i.e,, 
data  already  in  existence.  The  great  bulk  of  existing  data  is  available 
from  the  7iterature.  Obviously,  then,  we  needed  to  tailor  our  data  col¬ 
lection  procedures  tc  fit  this  most  important  data  source. 

Use  of  large  groups  of  data  collected  by  others  was  seriously  con¬ 
sidered,  and,  if  this  project  had  been  continued  to  the  point  of  implementa¬ 
tion.  it  is  likely  that  a  povtion.  of  the  data  collection  task  would  have 
been  delegated  to  an  organisation  such  as  the  Biological  Sciences  Communica¬ 
tion  Project  (of  George  Washington  University)  or  the  BioSciences  Informa¬ 
tion  Service  vof  Biological  Abstracts). 

Three  broad  phases  can  be  distinguished  in  most  data-collecting  efforts 
directed  toward  literature  search.  First,  source  documents  containing  rele¬ 
vant  data  must  be  collected  and  their  contents  sufficiently  defined  (briefly 
abstracted  or  summarized)  so  that  they  can  be  filed  appropriately.  This  is 
the  data  acquisition  phase.  Second,  the  pertinent  data  contained  within  the 
source  documents  must  be  extracted  (i.e.,  removed)  so  that  it  can  be  mani¬ 
pulated.  This  is  the  data  extraction  phase.  Third,  this  extracted  data 
must  be  put  in  a  form  (preprocessed)  suitable  for  entry  into  the  computerized 
information  system  —  and  entered  This  is  the  data  entry  phase. 

The  precise  mechanism  selected  for  data  collecting  should  depend  upon 
the  nature  and  extent  of  the  data  sources,  and  what  resources  are  available 
for  data  collecting.  A  logical  system,  which  meets  the  MOD  project's 
(initial)  needs,  is  shown  in  Figure  5-1.  It  begins  with  collection  and 
filing  of  data  source  documents  by  MOD  in-house  data-acquisition  personnel 
who  conduct  continuing,  comprehensive  surveillance  of  the  literature.  Data 
extractors,  (who  could  he  medical  and/or  graduate  students,  employed 
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DATA  SOURCES  ACQUIRED  AND  FILED 
(BY  RESEARCH  LIBRARIANS  AND  DATA 
CONSULTANTS  UNDER  DIRECTION  OF 
DAT  A- COL  LECTION  MANAGER) 


DATA  SOURCES  READ  AND  DATA 
EXTRACTION  FORMS  COMPLETED 
(BY  DATA  EXTRACTORS)  | 

I 

I 

(DISCUSSION 
AND  1 

CORRECTIONS) 

I 

! 

I 

DATA  EXTRACTION  FORMS  | 

CHECKED  'EDITED  I 

(BY  DATA  ANALYST  AND  . 

DATA  CONSULTANTS)  j 

•  I 


EDITED  DATA  EXTRACTION 

FORMS  KEYPUN THED 

(BY  KEYPUNCH  OPERATE.  '  -* 


PUNci.  O  CARDS  ENTERED  INTO 
MOD  SYSTEM,  CHEC KED/ EDITED 
BY  SYSTEM,  AND  DATA  LISTED 
FOR  FURTHER  CHECKING 
(BY  DATA  ANALYST) 

(CORRECTIONS) 


CORRECTED  DATA  INPUT  CARDS 
ENTERED  INTO  MOP  SYSTEM 
AND  STORED  IN  MOD  DATA 
FILES 

Figure  b-l  A  cuggested  pattern 
of  data  collection  for  the 


MOD  system. 
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parttime)  read  the  source  documents  and  extract  pertinent  data  from  them  by 
filling  out  data  extraction  forms  tha*-  were  specifically  designed  to  obtain 
mappable  disease-environmental  data.  The  completed  data  extraction  forms  are 
checked,  edited,  and  corrected  by  a  semi-professional  data  analyst  in  co¬ 
ordination  with  appropriate  biomedical  professionals  who  serve  as  data 
consultants.  Then  keypunch  operators  prepare  data  input  cards,  directly  from 
the  data  extrusion  forms.  Finally,  the  data  analyst  has  the^e  cards  listed, 
checks  the  listing,  makes  appropriate  corrections,  then  enters  the  cards  into 
the  MOD  computer  system.  The  organizational  chart  shown  in  Section  9 
(fig.  9-2,  p, 9-7) ,  indicates  the  personnel  required  tor  this  method  of  dau.'s 
collection. 

5.2  DATA  COLLECTION  .ACTIVITIES 

Major  efforts  at  data  collection,  related  particularly  to  data  extrac¬ 
tion  studies,  have  concentrated  on  leptospirosis.  (The  reasons  why  lepto¬ 
spirosis  was  chosen  have  already  been  discussed  —  p.  1-10).  More  than  4,500 
selected  references  have  been  collected,  approximately  half  of  which  have 
been  abstracted.  These  source  documents  were  acquired  through  personal 
inquiry,  bibliographic  searches,  and  a  continual  surveillance  of  selected 
periodicals  (professional  journals) ,  under  the  direct  supervision  of 
LTC  William  H.  Watson,  Jr.  (DVM)  VC,  USAF,  Chief  of  the  Geographic  Zoonoses 
Branch  of  the  Division  of  Geographic  Pathology  of  the  Armed  Forces  Institute 
of  Pathology  (AFIP) .  (For  more  detailed  information  about  data  sources  see 
the  Appendix.)  These  leptospirosis  data  served  as  the  basis  for  setting  up 
the  MOD  source  document  storage-and-retrieval  system  and  greatly  influenced 
our  thoughts  in  designing  the  MOD  data  structure,  the  factor  catalog,  the 
uata  extraction  forms,  and  the  computerized  data  files. 

Data  collection  and  extraction,  on  the  scale  that  we  have  discussed, 
is  a  long,  slow,  tedious  process.  While  this  aspect  of  the  program  was 
developing,  we  sought  a  readily  limited  "package"  of  data  which  would  allow 
us  to  begin  work  on  defining  and  assembling  data  points,  and  producing  maps 
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from  these  data  points.  Such  a  package  was  obtained  from  data  published  by 
Malek  (in  Studies  of  Disease  Ecology,  edited  by  May,  J.M. ,  1961),  relating 
to  schistosomiasis  in  South  America  and  Africa.  Although  these  data  are 
limited  in  their  extent  and  are  no  longer  current,  they  have  been  very 
helpful  to  us.  Use  of  these  data  has  provided  valuable  insight  into  both 
cartographic  procedures  and  problems  in  data  structuring. 

As  the  MOD  system  design  progressed,  it  became  necessary  to  test 
further  some  of  the  computer-mapping  programs  under  consideration.  Data 
required  to  make  these  tests  had  to  be  readily  available  in  a  simple  stand¬ 
ardized  (preferably  tabular)  format  and,  in  addition,  it  had  to  be  rather 
uniformly  and  densely  distributed  over  a  relatively  large  geographic  area. 
The  National  Communicable  Disease  Center  supplied  us  with  several  sets  of 
unpublished  data  on  rabies  in  the  eastern  U.  S.  that  entirely  satisfied 
these  requirements.  Unfortunately,  support  of  the  MOD  project  terminated 
before  full  use  could  be  made  of  this  material,  although  it  did  serve  its 
immediate  purpose,  and  several  maps  have  been  produced  from  it.  For  the 
same  reason,  i.e.,  termination  of  support,  we  have  not  taken  full  advantage 
(by  far)  of  the  extensive  leptospirosis  data  that  we  have  acquired  during 
the  past  two  and  a  half  years.  This  collection  has  fulfilled  a  very  im¬ 
portant  purpose,  however,  and  will  serve  effectively  as  a  major  component 
of  the  data  file  base  if  and  when  the  MOD  system  is  implemented. 

5,3  DATA  EXTRACTION  PROCEDURES 

Once  the  data  sources  have  been  collected  and  organized,  the  task  of 
extracting  the  relevant  data  from  the  sources  begins.  When  the  MOD  project 
began,  we  did  not  real.' ze  the  extent  to  which  this  would  pose  unusual  prob¬ 
lems.  However,  the  further  our  work  progressed,  the  more  apparent  it  be¬ 
came  that  data  extra-tion/preprocessing  presented  problems  that  were  far 
more  complex  than  could  have  been  anticipated  in  the  beginning.  Some  of 
these  problems  have  been  encountered  by  other  groups, but  more  often  avoided 
rather  than  resolved.  Some  of  the  important  problems  in  this  area  have  not 
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been  encountered  by  other  groups  so  far  as  we  can  determine,  probably  be¬ 
cause  no  one  else  has  attempted  to  develop  the  kind  of  system  which  MOD 
reprerents. 

Our  basic  approach  to  solving  these  data-extraction  problems  has  been 
through  a  group  effort,  involving  both  data-proceasing  and  date-collecting 
personnel.  Repeated  attempts  to  extract  and  put  into  consistent  form  the 
data  on  disease  and  environmental  factors  contained  in  selected  representa¬ 
tive  data  sources  were  carried  out,  until,  finally,  the  extracted  data  were 
in  a  form  acceptable  to  the  data  processors  as  well  as  the  data  collectors/ 
analysts.  As  a  result  of  th*s  extensive  trial  and  error  method,  general 
requirements  for  data  content/fonaat  were  formulated,  ‘11*“  tuis  has  been  one 
of  our  most  important  accomplishments  (largely  due  to  the  efforts  of  Dr.  Cuffey). 


Our  first  major  problem  was  that  no  generic  terms  existed  which  en¬ 
compassed  disease-environmental  data.  Thus  it  became  necessary  to  construct 
a  general  data-analysis  vocabulary  before  we  could  communicate  effectively 
in  relation  to  the  disease-environmental  data  which  we  were  attempting  to 
extract.  This  data-analysis  vocabulary  includes  definitions  for  and  dis¬ 
cussion  of  the  interrelationships  among  such  vitally  important  terms  as 
"factor",  "common  elements",  "value",  "data  point",  "map",  and  "narrative". 


The  second  major  problem  was  to  specify  precisely  what  items  of 
disease-environmental  Information  were  pertinent  to  our  major  objective: 
the  production  of  disease  distribution  maps.  This  led  Co  the  development 
of  '  catalogue  of  disease-environmental  factors  that  could  be  used  by  the 
MOD  computer  system  In  producing  disease-environmental  factor  distribution 
maps.  These  two  aspects  of  the  data  problem  are  discussed  in  the  preceeding 
section  and  (at  least)  the  material  contained  in  pages  4-4  through  -13 
should  be  read  before  considering  the  details  of  data  extraction. 


We  have  found  that  many  of  the  data  available  for  processing  are  in¬ 
complete  in  one  way  or  another  and,  often,  professional  judgement/lnter- 
pretatiou  (sometimes  extrapolation)  must  be  carried  out  if  the  data  are  to  be 


MAPPING  OF  DISEASE 


usable.  Narrative  print-out,  tc  accompany  the  computer  maps,  will  note  these 
interpretations,  and  source  document  numbers  will  be  available  upon  request 
should  the  user  wish  to  consult  the  data  source.  Some  of  the  data  will  be 
of  very  limited  use  because  essential  t actors  (which  must  hat .  been  known 
to  the  author)  simply  aren't  recorded  These  problems  are  numerous  and 
serious  (as  discussed  in  the  preceeding  section) ,  but  we  must  do  the  best 
we  can  with  the  information  available . 

Extraction  form  design  is  the  key  to  successful  data  extraction.  The 
development  of  extraction  forms  has  proved  to  be  exceptionally  difficult 
because  of  the  extremely  varying  content  of  the  data  sources,  coupled  with 
the  requirement  that  the  data  must  relate  to  ?  consistent,  geographic  illy- 
oriented  format. 

Initially,  efforts  at  designing  data  extraction  forms  vacillated  be¬ 
tween  tiie  -e  of  free-format  and  fixed-format  styles.  The  disadvantages  of 
each  were  usually  more  apparent  than  the  advantages.  Our  early  experience 
led  us  to  discount  tne  use  of  fixed-format  data  within  the  computer  data 
files,  and  this  bias  carried  over  to  design  of  the  fo-ms.  This  was  pri¬ 
marily  because  whenever  fixed-format  data  is  recorded,  the  resulting  system 
becomes  limited. 

We  have  come  to  the  view  that  the  freer  the  format  of  extraction,  the 
less  clearly  evident  what  data  is  desired.  In  addition,  the  freer  th*>  for¬ 
mat  the  longer  it  takes  to  extract  the  data  from  a  paper,  and  the  more 
difficult  it  be corns  to  reformat  the  data  for  computer  entry.  Even  more 
important,  there  will  be  a  greater  loss  ot  potentially  useful  data.  A  mere 
rigid  format  is  better,  primarily  because,  even  though  the  literature  is 
extremely  varied,  a  biomedical  person  can  translate  effectively  many  of  the 
variations  into  correspondences  using  a  few  standard  items  that  will  ulti¬ 
mately  be  easier  to  query.  Also  (.as  a  corollary  of  the  statements  about 
1 reer  format),  the  shorter  and  simpler  the  data  form,  the  more  data  points 
are  iikel.  to  be  extracted,  thereby  ensuring  a  more  useful  output. 
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Fixing  the  format  of  a  data  form  in  no  way  limits  the  computer  proc¬ 
essing.  The  data  form  is  designed  solely  to  guide  the  extractor  as  to  what 

data  is  desired,  and  to  insure  uniform  data  extraction.  For  example,  one 

i 

data  form  can  be  designed  for  leptospirosis,  another  fjor  schistosomiasis 
while  still  another  can  be  used  for  certain  environmental  factors.  All  the 
data  can  go  into  the  same  data  file  for  computer  processing  —  and  all  the 
data  (assuming  that  has  been  properly  formatted)  is  miscible  (and,  obviously, 
usable  in  an  almost  infinite  variety  of  contexts). 


Data  forms  were  designed  with  careful  consideration  of  the  MOD  data 
structure  and  catalog  of  factors,  and  in  consultation  with  various  bio- 
medicaL  and  data-process lng  professionals.  The  forms  were  then  tested 
(actually  used  with  source  documents),  modified  as  necessary,  retested,  etc., 

i 

etc.  until  they  appeared  to  be  both  effective  and  efficient.  By  requiring 
(some)  items  to  be  recorded  in  a  fixed  format  rather  than  "natural  language", 
simple  codes  were  utilized,  on  the  forms,  to  facilitate  keypunching.  Further¬ 
more,  these  codes  were  constructed  so  as  to  allow  the  computer  system  to 
perform  certain  kinds  of  error  checking  of  the  input  data.  The  latest  lepto¬ 
spirosis  data  extraction  forms  are  given  (Fig.  5-2  and  5-3),  also  forms 
that  were  used  to  record  schistosomiasis  (Fig.  5-4)  and  rabies  (Fig.  5-5) 
data.  One  example  of  an  environmental  data  extraction  form,  shown  in 
Fig.  5-6,  was  used  to  record  dita  relating  to  the  geographic  distribution  of 
certain  small  mammals  (particularly  important,  to  understanding  the  epidem¬ 
iology  of  leptospirosis).  Another  type  of  environmental  data  collection 
form,  shown  In  Fig.  5-7,  was  used  in  compiling  a  file  of  published  maps 
dealing  with  environmental  factors  of  southeast  Asia. 

Our  tentative  scheme  for  handling  the  collected,  selected  data  is  a 
three-stage  process: 


(1)  Data  extractors  (necessarily  with  biomedical 

background  since  value  judgements  are  required) 
will  fill  in  relatively  simple  data-extraction 
forms.  These  forms  will  be  Submitted  to  a 

} 

-  continued  page  5-17 
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Figure  1-3  —  continued 
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Figure  b-4  MOD  data  form  for  extracting  standard  schistosomiasis  test 
data  (given  earlier  in  Fig.  3-1). 
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Figure  5-5  MOD  data  form  for  extracting  rabies  data  (see  Fig.  3-44't. 
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data-analyst (s) ,  who,  with  the  help  of  data- 
consultants , as  necessary,  will  check  (edit) 
the  forms 

(2)  The  data  analyst  will  transcribe  the  data  from 
the  extraction  form  to  a  more  rigidly  formatted 
(intermediate)  form. 

(3)  The  intermediate  form  will  be  converted  to 
punched  cards  for  input  into  the  computer  system. 

Elaborating  upon  phase  1  (above) ,  the  data  extractor  first  obtains  a 
suitable  source  document  (ordinarily  from  the  MOD  data  source  stO’  ige-and- 
retrieval  files).  Then  he  skims  the  document,  selects  blocks  of  pote  tial 
data  points,  partially  fills  in  several  forms  (with  items  common  to  several 
points),  and,  using  a  duplicating  machine  (e.g».  Xerox),  makes  a  number  of 
copies  of  each  partially  completed  form.  Using  Xeroxed  forms  and  unicolored 
pencils  can  save  much  time  because  of  the  great  number  of  duplicate  entries 
required  to  extract  many  data  points  from  one  source  document.  An  alterna¬ 
tive  possibility  would  be  to  enter  all  the  common  items  for  each  document  on 
a  single  master  sheet,  and  to  provide  a  reference  by  which  the  data  points' 
forms  could  be  connected  to  this  master  sheet.  Since  most  documents  contain 
several  groups  of  data  points  for  which  several  master  sheets  would  have  to 
be  constructed,  we  have  found  it  more  efficient  to  make  duplicate  copies  of 
partially  completed  forms  during  the  dat3  extraction. 

Next,  the  data  extractor  carefully  goes  back  through  the  source  docu¬ 
ment  and  completes  one  data  forai  for  each  data  point.  He  then  adds  "other 
LOF's/MOF's"  (and  NAR)  ,  necessary  to  ensure  that  each  data  point  is  ade¬ 
quately  and  completely  defined.  Finally,  tie  talks  with  appropriate  data 
consultants  to  resolve  any  remaining  questions  about  particular  data  points. 

Based  upon  a  relatively  small  but  diverse  and  representative  sample 
of  data  source  documents,  we  have  found  that  approximately  two  papers,  each 
averaging  10  pages,  can  be  extracted  in  a. nth  by  one  data  extractor  in  a 
day.  With  training  and  experience  data  extractors  could  certainly  work 
faster  than  this,  but  not  a  great  deal  faster  so  long  as  they  extracted 
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(virtually)  all  01  the  significant  data.  Obviously,  there  are  far  too  many 
disease-environmental  factors  described  in  the  published  literature  to 
extract  all,  or  even  most,  of  them.  The  most  rational  approach  is  to  be 
selective:  to  extract  in  depth  only  for  highly  relevant  factors,  and 

(unless  there  is  particular  reason  not  to)  restrict  attention  to  those  items 
which  are  necessary  to  construct  data  points  (as  described  in  Section  4). 
Data  points  with  many  qualifying  LOF's/MOF's  may  be  used  so  seldom  in  re¬ 
trieval  and  mapping  operations  that  their  value  is  not  worth  their  cost. 
These  two  methods  of  approach,  coupled  with  careful  selection  of  the  source 
documents .  will  help  with  the  data  volume  dilemma,  and  still  provide  suf¬ 
ficient  kinds  and  numbers  of  data  points  to  make  computer  processing  worth¬ 
while.  We  have  mentioned  before  that  a  large  volume  of  environmental  data 
is  already  published  in  map  form  and  that  computerized  maps  of  the  type  we 
are  discussing  can  be  made  to  fit  with  (e.g.,  overlay)  a  published  map, 
obviating  the  need  to  extract  and  process  that  data  which  has  already  been 
mapped.  The  use  of  secondary  source  documents  (compiled  data)  and  uata 
presented  in  tabular  form  would  also  materially  reduce  extraction  require¬ 
ments  . 

In  the  discussion  of  time  and  effort  involved  in  data  extraction  we 
have  considered  primarily  the  average  time  required  to  extract  "representa¬ 
tive"  source  documents,  but  the  source  documents  vary  enormously.  Some 
short  papers  yielded  as  many  as  50  data  points;  others,  of  comparable  size, 
yielded  only  one.  Furthermore,  the  structure  and  the  language  of  source 
documents  can  make  it  difficult  (and  tin:*3  consuming)  or  easy  (and  relatively 
quick)  to  extract  the  data  necessary  to  produce  data  points. 

Based  on  our  experiences,  there  will  be  considerable  personnel  prob¬ 
lems  among  data  extractors.  Extraction  efforts  are  very  demanding  and 
fatigue  develops  far  out  of  proportion  (seemingly)  to  the  work  actually 
accomplished.  Furthermore,  we  observed  lack  of  day  to  day  consistency  in 
the  type  of  data  actually  recorded  on  the  data  extraction  forms.  The  task 
is  very  boring  and  it  is  easy  for  the  extractor  to  become  distracted.  In 
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our  experiences  several  people,  each  working  part-time,  were  better  than  one 
working  full-time  since  this  reduced  boredome  and  the  frequency  of  distrac¬ 
tion.  Some  consistency  is  lost,  but, on  different  days,  the  output  of  a 
single  extractor  was  as  inconsistent  as  that  from  several  different  extrac¬ 
tors.  It  seems  likely  that  but  a  small  proportion  of  persons  will  be  found 
to  have  the  psychologic  and  intellectual  qualities  necessary  to  become 
excellent  data  extractors. 

Many  problems  appeared  during  our  attempts  to  develop  data  extraction 
techniques.  Those  that  relate  to  data  characteristics,  such  as  incomplete¬ 
ness  and  unreliability  of  data  points,  assignment  of  ”alues,  accurate  speci¬ 
fication  of  disease  measures  and  samples  involved,  and  precise  statement  of 
the  geographic  locations  of  data  points,  have  already  been  discussed.  The 
great  variation  in  quality  of  data  sources,  as  well  as  in  the  numbers  of 
data  points  extractabLe  from  single  source  documents,  have  also  been  con¬ 
sidered.  Perhaps  the  most  important  difficulty  lies  in  the  fact  that  much 
of  the  material  in  a  typical  narrative  paper  is  not  mappable.  Moreover, 
that  material  which  is  mappable  may  require  the  extractor  to  read  large 
sections  of  the  paper  and  then  piece  together  —  using  professional  judg¬ 
ment  —  the  few  disconnected  fragments  of  critical  data,  floating  in  a  sea 
of  words  that  contribute  no  data  but  yet,  somehow,  seem  necessary  for  the 
paper  to  be  understandable.  Extracting  data  is  clearly  not  a  simple  matter; 
there  is  no  automatic  procedure  by  which  narrative  sentences  can  be  con¬ 
verted  into  coKiputer-mappable  data  points.  Putting  it  another  way,  "data 
processing"  (at  this  level)  becomes  heavily  involved  with  "communication 
science"  and  general  semantics. 

*  *  * 

In  summary  —  our  experiences  with  data-manageraent  in  the  context  of 
this  comprehensive  program  have  exposed  complexities  of  a  degree  that  we 
could  not  ..ave  anticipated.  It  has  become  clearly  evident  that  the  moot 
critical  factor  limiting  meaningful  computer  output  of  +he  MOL  sue -cm  te 
the  cot  Cent/ format  of  input  data.  The  sources  of  the  data  are  readily 
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available >  but  there  are  major  difficulties  in  extracting/ format  ting  these 

aata.  These  problems  relate  to: 


(I;  Highly  varying  source  document  content  (requiring 
development  of  a  data-anaiysis  vocabulary  and  a 
factor  catalogue  to  establish  common  denominators). 

(2)  Highly  varying  reliability  of  raw  data  (requiring 

a  system  for  defining  reliability  and,  on  occasion, 
val idating  data) . 

(3)  Necessity  for  continual  changes  in  and  additions 
to  the  data  base  file  (making  unusual  requirements 
for  editing  and  updating). 

(4)  Lack  of  a  generic  vocabulary  encompassing  medical- 
environmental  situations  (related  to  item  iti) . 

(5)  Inherent  complexities  in  the  data  which  make  it 
difficult  to  specify  a  standardized  procedure(s> 
for  the  extraction,  editing,  structuring,  and 
storing  of  the  data  prior  to  computer  input. 

(b)  Data  file  design  problems  due  to  complexities  of 
the  data  in  general,  its  great  volume  and  the 
large  number  of  interrelationships  among  the 
specific  data  and  among  descriptions  associated 
with  vocabulary/definitions  af ter  computer  input. 

But  turning  to  a  more  positive  view,  our  effot ts  have  resolved  (in  large 

measure)  most  of  these  problems.  The  evidence  for  this  is  in  the  form  of 

computer  produced  medical-environmental  maps,  maps  derived  from  data  that 

were  collected,  extracted,  and  input  in  the  manner  described  in  this  section. 


I 
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5.4  DATA  INPUT  OPERATION'S 

The  last  phase  of  the  MOD  data  collection  process  involves  transferring 
the  data  contained  on  the  data  extraction  forms  into  the  MOD  computer  system 
data  files.  The  basic  procedures  for  tiiis  are  Illustrated  by  an  actual 
example  as  shown  in  Fig.  3-8. 

Tiie  data  extraction  forms  filled  out  by  the  data  extractors  are  first 
examined,  edited,  and  corrected,  if  necessary,  by  a  data  analyst.  Tills 
person,  functioning  as  an  intermediary  between  the  data  extraction  personnel 
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and  the  computer  system,  must  be  familiar  with  both  biomedical  and  auto¬ 
matic  data  processing  matters  so  that  he  can  detect  (and  correct)  erroneous 
or  inappropriate  data  point  values,  inconsistencies  and  errors  in  data  for¬ 
mat,  unallowable  LOF's  in  particular  MOF's,  omission  of  critical  items  in 
the  data  point  records,  and  the  like.  Questions  which  the  data  analyst  can¬ 
not  resolve  will  be  answered  by  the  data  consultant. 

The  data  analyst  then  gives  the  edited  data  extraction  forms  to  key¬ 
punch  operators  who  punch  the  data  directly  into  standard  80  -  column  punch 
cards.  These  cards  have  been  chosen  as  the  input  medium  for  the  MOD  system 
(as  discussed  in  tne  section  on  computer  system  requirements)  primarily  be¬ 
cause  of  the  flexibility  which  they  provide  in  all  phases  of  data  processing. 

Compromise  is  inevitable  in  selecting  the  form  of  the  input .  A  natural 
or  problem  oriented  language  is  easier  for  the  data  processor  to  use  whereas 
a  fixed-form  input  format  is  easier  for  the  computer  to  handle.  A  proper 
compromise  is  one  in  w.,ich  the  k'nd  of  language  input  format  developed  best 
suits  the  total  procedure.  Arriving  at  this  proper  compromise  represents  a 
critical  step  since  an  inappropriate  selection  would  lead  to  much  delay  and 
costly  duplication  of  effort.  Before  we  took  this  critical  step  a  number  of 
trials  were  conducted,  and  these  involved  use  of  fixed-field  and  variable- 
length  records,  numerical  codes,  alphanumeric  (.mnemonic)  codes,  and  essen¬ 
tially  natural  language-type  statements.  We  concluded  that  intornation 
appearing  on  the  data  input  cards  should  be  in  a  fixed  format  —  partly 
fixed  card-column,  but  mostly  fixed  order  and  punctuation.  (The  data  input 
card  formats  are  discussed  in  detail  later  in  connection  with  data  processing 
operations.)  Because  relatively  little  date  has  been  key  punched  according 
to  the  full  data  point  format  developed  for  the  MOO  system,  we  cannot  give 
precise  estimates  of  the  effort  required  to  process  large  numbers  of  data 
extraction  forms. 

The  next  step  in  the  data  input  process  is  i  r  the  data  analyst  to 
input  the  punched  cards  to  the  MOD  ce- -•peter  system.  The  system  reads  the 
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cards,  performs  the  kinds  of  error  checking  :or  wnich  it  is  programmed, 
prints  tut  a  listing  showing  all  the  data  cards  and  also  indicates  the 
errors  found. 

The  data  analyst  carefully  examines  th°  listing  tor  possible  errors 
of  the  kind  which  the  system  cannot  detect.  He  corrects  the  errors  which 
he  finds,  and  tho^e  which  the  ‘-stem  noted,  by  repunching  appropriate  oata 
input  cards. 

Finally,  the  data  analyst  puts  the  corrected  deck  oi  punched  cards 
back  into  the  computer  system.  The  system  reaus  the  earns  and  stores  their 
contents  internally  in  its  various  files. 

At  this  point,  the  •'  ■  collecting  tasks  are  completed  and  the  com¬ 

puter  system  is  prepared,  so  far  as  its  data  file  base  is  concerned,  to 
manipulate  data  in  response  to  query. 

*  *  -k 

We  emphasize  once  again  the  importance  and  difficulty  of  the  research 
effort  that  has  been  necessary  to  process  the  raw  data  8  -a  ore  they  reach 
the  computer,  without  adequate  and  properly  formatter  input,  sign,  if  leant 
output  is  impossible.  The  old  term,  GIGO,  expresses  the  situation  well: 
garbage  in/garnage  out 
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ABSTRACT  -  This  section  refloats 
the  System  analysis  phase  of  the  MOD 
project.  Having  established  output 
requirements 3  and  having  character¬ 
ized  input y  the  hardwire  and  software 
necessary  to  operate  the  system  are 
specified.  The  General  considerations 
portion  of  the  section  is  directed  to 
those  who  have  little  background 
knowledge  of  computer  s cience/ tech¬ 
nology  ,  and  attempts  to  give  a  basic 
orientation. 


''He  who  desires  the  ends  desires  the  means." 
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6.0  GENERAL  CONSIDERATIONS 

A  broadly  based  automated  system  designed  to  study  the  geographic 
distribution  of  infectious  diseases  will  succeed,  as  previously  stated, 
only  if  computer  techniques  can  be  effectively  applied  tc  it.  Th«.  basic 
aspects  of  computer  systems  which  are  to  be  presented  here  will  (hopefully) 
give  insight  into  the  applicability  of  these  systems  to  the  MOD  program  — 
and  the  problems  involved  in  making  the  application. 

The  electronic  computer  is  one  of  the  most  powerful  tools  man  has 
ever  devised.  It  is  being  applied  to  evaluation  or  control  of  more  and 
more  areas  of  man's  environment  —  economics,  science  and  technology,  in¬ 
dustry,  and  education  —  to  accomplish  tasks  which  were  formerly  considered 
beyond  the  scope  of  human  ability,  or  which  x squired  an  eve^  increasing 
staff  of  people  to  accomplish.  In  addition  to  these  many  new  tasks,  all  of 
us  hope  that  the  computer  will  provide  a  means  to  free  us  from  mundane, 
repetitive  tasks  in  the  performance  of  many  of  the  old  tasks. 

In  the  final  analysis,  however,  the  computer  is  recognized  as  just  a 
tool  —  a  tool  to  be  manipulated  not  by  man's  hands  but  by  his  mind.  Engi¬ 
neers  have  built  into  computers  the  capability  to  perform,  but  it  is  the 
programers  who  actually  cause  the  computers  to  perform.  This  combination 
of  engineering  and  programing  talents  is  integrated  within  the  computer 
system  as  a  latent  talent.  It  is  the  user  who  actually  supplies  the  motiva¬ 
tion  (in  a  sense)  by  presenting  the  problem  to  be  solved. 

The  computer  contains  a  control  unit  which  allows  it  to  sequence  from 
one  operation  to  the  next  while  processing  a  large  stream  of  calculations. 
Numbers  stored  in  the  computer  represent  either  data  or  instructions  and  it 
is  the  programer's  task  to  cause  the  computer  to  act  on  the  numbers  in  the 
correct  manner.  The  fact  that  both  arithmetic  and  logical  operations  are 
possible  has  permitted  computers  to  be  used  in  a  wide  variety  of  applica¬ 
tions.  The  single  characteristic  which  has  contributed  most  to  their  popu¬ 
larity,  of  course,  is  the  rapidity  with  which  they  perform  complex  as 
as  simple  operations. 


6-2 


6*.  Computer  System  Requirements 


Perhaps  one  of  the  most  important  contributions  of  computer  technology 
is  that  it  has  forced  man  to  sf^te  problems  in  entirely  logical  terms  so 
that  the  computer  can  solve  them.  The  opposite  side  of  this  coin  is  that 
any  problem  that  can  be  stated  logically  or  expressed  in  terms  of  mathe¬ 
matical  equations  can  (in  principle)  be  solved  by  a  computer. 

Jiich  of  the  information  presented  in  Section  6.0  is  elementary  and 
aimed  at  bio-medical  personnel  who  have  not  had  occasion  for  even  an 
elementary  consideration  of  computer  technology.  Those  who  are  computer 
oriented  are  advised  to  turn  to  Section  6.1. 

The  computer  performs  storage  and  retrieval  functions  in  much  the 
same  manner  as  a  human  being  or  a  calculating  machine.  The  computer  con¬ 
sists  of  large  blocks  of  equipment  ("hardware5')  containing  many  transistors, 
tubes,  and  other  basic  electronic  components.  Most  computers  are  organized 
to  handle  five  basic  functions:  (1)  input,  (2)  storage,  (3)  control, 

(4)  processing,  and  (5)  output.  Before  solving  a  problem,  the  pertinent 
facts  and  data  must  be  input  (by  means  of  electro  'chanical  devices  such 
as  card  or  paper  tape  readers,  keyboards,  etc.)  and  stored  (on  tapes,  disks, 
drums,  cores,  etc.)  much  as  a  human  being  gathers  facts  and  stores  them  in 
his  brain.  Once  stored,  a  control  section  selects  d**--,  one  item  at  a  time, 
and  processes  it  in  its  arithmetic  element.  Tb„  control  function  is  simply 
the  means  of  following  instructions  precisely  as  programed.  The  computer 
must  be  instructed  (programed)  every  step  of  the  way.  Results  are  useful 
only  after  they  are  output  (displaj'ei  by  a  printer,  plotter,  cathode-ray 
tube,  etc.)  or  re-stored  (back  in  memory,  punched  on  "ards,  or  comnunicated 
to  remote  devices)  for  later  use. 

We  have  mentioned  that  the  computer  mist  be  programed  to  acquire  an 
ability  to  solve  problems.  This  is  because  it  ii.  Impractical  to  build  a 
computer  capable  of  interpreting  the  wide  variety  of  Instructions  that  a 
human  being  could  understand.  And  this  is  the  reason  that  computing  procedures 
are  broken  down  to  a  relatively  few  different  types  of  instructions.  Hence, 
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the  general  plan  of  action  written  out  (or  encoded)  as  a  specific  set  of 
operational  instructions  may  be  long  and  involved,  even  for  an  apparently 
simple  problem.  (Programs  are  commonly  referred  to  as  "software  '  to 
differentiate  them  from  the  computer  "hardware".)  We  have  also  mentioned 
that  the  computer  operates  on  numbers ,  both  as  data  and  as  instructions, 
however,  the  computer  does  not  use  the  familiar  decimal  system  (with  ten 
digits),  but,  as  a  rule,  uses  a  binary  system  in  which  there  are  only  two 
bits:  0  and  1.  (The  next  number  larger  than  1  is,  therefore,  10.)  These 
characteristics,  which  make  the  computer  so  extremely  flexible  and  versa¬ 
tile,  also  make  it  necessary  for  all  actions  to  be  defined  in  great  (pre¬ 
cise)  detail.  It  is  because  of  this  that  a  major  part  of  the  human  work 
involved  in  solving  a  problem  on  a  computer  is  in  preparing  the  program. 
Frequently,  many  man-hours  are  required  to  prepare  for  a  few  minutes  of 
actual  computer  operation. 

because  programing  directly  in  machine  language  is  tedious  as  well 
as  time  consuming,  computer  manufacturers  supply  (with  their  machines)  pro¬ 
grams  that  interpret  so  called  interim  languages  (which  are  much  easier  for 
the  programer  to  use) ,  programs  that  convert  this  interim  language  program 
into  a  machine  language  equivalent  program.  A  simple  example  is  the 
assembler  which  translates  sequences  of  characters  into  other  sequences  of 
characters  and,  in  so  doing,  puts  together,  i.e.,  assembles,  a  program. 

For  example,  the  characters  "ADD",  representing  the  function  of  addition 
in  the  programer's  language,  can  be  changed  into  the  binary  configuration 
(which  might  be  "1.11011")  that  actually  causes  the  computer  to  perform 
addition.  In  a  sense,  the  assembler  is  acting  as  an  interpreter.  But  this 
still  leaves  the  programer  many  tedious  operations.  Tc  provide  further 
relief,  more  sophisticated  methods  have  been  developed.  These  methods  usu¬ 
ally  involve  a  higher  lev<_l  interim  language,  approaching  more  closely 
natual  English.  Such  language  is  then  interpreted  by  a  special  program 
that  translates  it  into  a  machine  language,  going  beyond  the  one  to  one 
stage  since  several  commands  may  have  to  be  specified  for  each  input 
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language  expression.  A  program  of  this  sort,  wnich  produces  other  programs, 
is  called  a  compiler. 

Since  computers  are  designed  for  a  variety  of  uses,  ordinarily,  no  two 
computer  manufacturers  (often  no  computers  of  different  type  produced  by  a 
single  manufacturer)  produce  machines  which  use  the  same  internal  codes  to 
represent  commands,  thus  a  program  written  in  an  assembly  language  for  one 
computer  will  not  operate  directly  on  another  computer.  This  is  another 
reason  that  higher  level  languages  have  been  implemented  on  a  variety  of 
computers.  (As  we  have  implied,  they  also  permit  faster  and  more  efficient 
programming.)  There  are  several  higher  level  languages,  since  problems  fall 
into  reasonably  well  defined  categories,  each  of  which  requires  a  different 
method  for  solution.  For  example,  languages  for  mathematical  applications 
have  been  developed  which  are  quite  different  from  languages  for  business 
applications. 

Today,  techniques  for  producing  computers  are  far  ahead  of  techniques 
for  ing  them.  The  technique  of  step-by-step  coding  of  programs  is  waste¬ 
ful  of  time,  money,  and  personnel.  Soon,  perhaps,  the  computer  itself  can 
be  directed  to  do  much  of  the  work  of  coding,  i.e.v  automatic  coding  of 
programs  will  be  possible.  But  until  that  time  comes,  we  must  accept  the 
fact  that  it  is  time  consuming  and  costly  to  produce  computer  programs. 
Because  of  the  intimate  relationships  between  software  (programs)  and  hard¬ 
ware,  a  critically  important  part  of  system  design  is  the  selection  of 
hardware.  Of  course  the  equipment  must  be  capable  of  meeting  task-require¬ 
ments,  but  it  should  also  be  of  such  design  as  to  minimize  the  amount  and 
complexity  of  software  that  will  be  necessary  to  operate  the  system 

U  HARDWARE  REQUIREMENTS 

The  basic  considerations  in  determining  computer  hardware  requirements 

are: 

continued  next  page 
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(1)  The  overall  amount  of  data  to  be  stored  and  the 
required  frequency  of  access. 

(2)  The  amount  of  data  which  must  be  considered  at  the 
same  time. 

(3)  The  method  by  which  the  data  must  be  processed. 

(4)  The  form  in  which  the  data  must  be  input. 

(5)  The  form  in  which  the  output  is  desired. 

(6)  The  cost  (including  that  or  the  required  software). 

6.1.1  OUTPUT  DEVICES 

Output  devices  are  a  primary  consideration  since  the  MOD  syste  is 
centered  around  output,  and  we  shall  consider  these  first. 

The  input-output  equipment  of  a  computer  is  sometimes  referred  to  as 
peripheral.  If  operated  and  controlled  by  the  computer  itself,  it  is  on¬ 
line;  if  operated  independently  of  the  computer,  it  is  off-line.  In  rela¬ 
tively  slow  computer  systems  the  peripheral  equipment  is  frequently  on-line, 
but,  to  avoid  holding  up  an  expensive  fast  computer  for  time-consuming  in- 
put-ouput  operations,  off-line  techniques  are  often  used.  Any  of  the  three 
output  devices  described  below  can  be  operated  either  on-  or  off-line. 

(1)  Line-Printer:  In  essence  this  is  a  very  large  rapid  typewriter 
roll  that  prints  an  entire  line  of  at  least  100  characters,  virtually  at  one 
time.  In  any  case,  the  entire  line  is  composed  within  the  controller  prior 
to  printing.  While  a  line-printer  is  relatively  fast  (up  to  1000  lines  per 
minute),  it  has  a  limited  character  set  (usually  fewer  than  64),  and  prints 
letters  only  in  upper  case.  It  can,  however,  produce  a  rough  map  conven¬ 
iently,  rapidly,  and  inexpensively,  since  almost  all  computer  installations 
have  access  to  a  line-printer.  In  addition,  the  line-printer  provides  the 
most  advantageous  way  of  generating  hard-copy  reports  directly  and  rapidly. 

(2)  Plotter:  This  is  a  marker  (pen  of  one  color  ink,  a  group  of  pens 

of  various  color  inks,  or  a  scribing  point)  that  is  mounted  on  a  self-propelled 
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movable  stand  which  draws  patterns  on  drafting  material  (blank  or  gridded 
paper,  or  drafting  plastic,  or  a  pre-printed  base  map).  Plotters  may  be 
operated  on-line,  but  to  conserve  computer  time,  are  more  commonly  operated 
off-line  (by  a  magnetic  tape  produced  on  a  computer,  then  taken  off  the 
computer  and  put  onto  the  independent  control  unit  of  the  plotter,  which  is 
separate  from  the  computer).  Plotters  tend  to  be  relatively  slow;  they  may 
take  up  to  several  hours  to  draw  a  moderately  complex  map,  however,  they 
provide  the  highes*-  resolution  maps.  A  plotter  commonly  is  limited  to  draw¬ 
ing  straight  lines  from  one  point  to  another,  nevertheless,  it  is  a  very 
flexible  instrument  since  any  curved  line  can  be  composed  of  multiple  short 
straight  line  segments.  Furthermore,  lettering  is  easily  performed  since 
characters  may  also  be  composed  of  short  straight  line  segments.  Plot  size 
is  the  primary  limitation  of  plotters.  One  type,  called  a  flatbed  plotter, 
employs  a  flat  drawing  board  with  sharp  limits  of  both  width  and  length. 
Another  type,  called  a  drum  plotter,  utilizes  a  cylindrical  drum  around 
which  the  drafting  material  Is  wound;  width  is  limited  but  not  length.  A 
plotter  in  the  low-  to  medium-price  ange  should  be  capable  of  providing 
finished  maps,  including  legends,  up  to  30"  in  width. 

(3)  Cathode-Ray  Tuoe  (CRT)  Display:  This  is  a  vacuum  tube,  similar 
to  a  television  screen,  in  which  a  beam  of  electrons  can  be  focused  to  a 
small  point  on  a  luminescent  screen  and  varied  in  both  position  and  in¬ 
tensity  to  form  a  pattern.  A  CRT  operates  on  principles  entirely  similar 
to  those  which  have  been  described  for  a  plotter,  but  an  electron  beam  is 
substituted  for  a  marker  pen  and  electronic  control  is  substituted  for 
electro-mechanical  control.  Whereas  the  plotter  produces  hard-copy  output 
directly,  the  CRT  screen  must,  be  photographed  to  obtain  a  lasting  image. 

The  CRT  is  very  much  faster  than  the  plotter,  but  offers  considerably  less 
resolution  (at  its  present  stage  of  development).  Furthermore,  the  cost 
of  a  CRT  capable  of  meeting  MOD  output  requirements  would  be  prohibitive. 
Present  MOD  requirements  can  be  satisfactorily  met  by  using  a  line-printer 
to  provide  rapid  map  output  —  for  an  over-view  evaluation  —  and  by  using 
a  plotter  to  produce  high  resolution  maps  when  these  are  required. 
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6.1.2  INPUT  DEVICES 

Input  devices  are  of  secondary  importance,  but  important,  neverthe¬ 
less,  since  there  is,  potentially,  a  very  large  volume  of  data  to  be  used 
in  the  MOD  system,  data  which  will  cover  all  pertinent  disease/environmental 
situations.  Input  devices  (as  are  output  devices)  are  often  off-line  to  the 
central  computer  system;  a  medium  such  as  magnetic  tape  serves  as  aw  inter¬ 
mediary.  The  critical  problem  of  input  lies  with  the  conversion  of  raw 
data  into  a  form  that  is  acceptable  to  computer  input  devices.  Input  of 
queries  into  the  MOD  system  is  a  closely  related  problem  and  sufficiently 
similar  that  a  solution  of  the  one  should  also  satisfy  the  other.  Poten¬ 
tially  useful  input  devices  include: 

(1)  Optical  Character  Recognition  (OCR)  Device:  This  is  a  device 
by  which  text  can  be  read  directly  from  documents  and  automatically  trans¬ 
lated  into  machine  language  for  direct  input  to  the  computer,  OCR  would 
be  of  great  advantage  if  entire  reports  were  to  be  read,  but  in  the  MOD 
system,  scientific  papers  are  only  the  background  material  for  preparing 
data  points.  Extracted  data  (recovered  throughout  the  entire  paper)  must 
be  converted  to  suitable  computer  input.  Transcription  of  the  extracted 
data  could  be  accomplished  by  typing  the  data  with  ;  special  font  type¬ 
writer  for  optical  reading  or  by  keypunching  the  data  onto  punched  cards. 
Either  of  these  methods  seem  preferable  to  the  use  of  OCR  devices  —  at  the 
present  time.  (The  high  cost  of  OCR  devices  and  the  present  rather  limited 
state  of  the  art  were  also  considered  in  arriving  the  above  conclusion.) 

(2)  Punched  Paper  Tape  Reader:  This  is  a  device  for  converting 
information  on  \  aper  tape,  punched  or  otherwise  marked,  and  transfering  it, 
one  character  at  a  time,  to  the  cotiputer.  Paper  tape  is  a  relatively  oid 
and  well  standardized  medium  that  was  developed  to  permit  more  efficient 
use  of  the  telegraph  line.  Tapes  oxild  be  produced  by  a  typist,  at  the 
typist's  own  rate,  and  the  information  contained  on  them  transmitted  to 
and  from  punched  paper  devices  at  the  maximum  rate  of  the  transmitting  and 
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receiving  equipment.  Holes  punched  in  a  moving  strip  of  paper  represent 
alphanumeric  characters.  Multiple  channels  on  the  paper  tape  (usually 
eight)  are  read  simultaneously,  permitting  an  entire  character  to  be  read 
at  one  time.  Punched  paper  tape  has  been  adapted  to  computers  and  is  used 
extensively,  particularly  in  less  expensive  systems.  The  paper  tape  reader 
is  the  least  expensive  type  of  input  device. 

A  major  disadvantage  of  paper  tape  is  its  inflexibility.  Once  it  is 
punched,  alterations,  including  insertions,  is  not  possiMe.  Correction  is 
possible  as  the  tape  is  prepared  initially,  but  this  does  not  suffice  for 
the  MOD  system  since  there  is  need  to  modify  data  as  it  is  input  and  proc¬ 
essed.  Furthermore,  the  paper  tape  does  not  offer  the  "unit-record"  capa¬ 
bility  provided  by  other  input  media,  and  this  would  be  a  serious  limitation 
to  the  MOD  system. 

(3)  Punched  Card  Reader:  This  is  a  device  for  sensing  holes  punched 
in  cards  and  translating  that  information  into  a  form  acceptable  to  the 
computer.  For  our  purposes  we  can  limit  the  discussion  of  punched  cards  to 
the  widely  used  80-column  IBM  punched  card  with  its  Hollerith  coding  repr-. 
sentation.  The  punched  card  well  reflects  the  unit-record  concept.  Each 
card  contains  80  columns  v characters)  of  information  which  can  be  altered 
without  affecting  the  other  cards  in  the  complete  record,  the  physical 
characteristics  of  the  card  facilitate  sorting,  collating,  and  other  data- 
handling  operations.  Because  punched  cards  have  been  in  use  for  a  long 
time  (since  the  late  1800's),  much  auxiliary  non-computer  machinery  has  been 
developed  to  handle  them,  e.g.,  keypunch  machines ,  designed  to  record  data 
on  a  card  in  the  form  of  punched  ho U s ,  in  response  to  an  operator  who 

sti  kes  the  appropriate  keys  of  a  typewriter-like  keyboard. 

(4)  Dlgi 1 1  re r .  This  is  an  anaiog-to-d i gi cal  converter  device  in 
which  the  operator  moves  a  pointer /sensor  along  a  curve  or  to  a  point  cn  the 
drawing  board  and  presses  a  button,  whereupon  the  uigitizer  machine  reads 
the  ( X , Y )  coordinates  of  the  sensor,  transmitting  these  dlrec  t  lv  into  the 


I 

i 


MAPPING  OF  DISEASE 

computer  or  onf.~  another  medium  (e.g.,  punched  cards  or  magnetic  tape). 

The  Army  Map  Set  „nd  the  Bureau  of  th  Census  both  utilize  digitizers 
in  their  work.  In  the  MOD  system,  digitizers  could  be  quite  useful  in 
translating  data  from  existing  maps  into  computer  form  for  use  in  sub¬ 
sequent  processing. 

6.1.3  STORAGE  DEVICES 

All  computers  have  a  rapid  access  storage  device,  usually  randomly 
accessible.  It  is  in  this  primary,  i.e.,  m^in,  storage  device  that  program 
instructions  and  data  are  stored  and  from  which  instructions  are  retrieved 
by  the  control  unit  .and  executed.  Main  storage  is  of  interest  only  in  that 
there  is  a  mimimum  requirement  (to  be  discussed  later)  for  the  MOD  system. 

In  addition  to  the  computer’s  main  storage,  auxiliary  storage  devices 
must  be  provided  in  which  MOD  data  of  ail  types  will  be  filed.  Auxiliary 
storage  has  a  much  greater  capacity  than  main  storage,  however,  the  informa¬ 
tion  is  less  rapidly  accessible.  Three  of  the  various  types  of  auxiliary 
storage  devices  are  described  below. 

(1)  Magnetic  Drum:  is  a  rapidly  rotating  cylinder  whose  outer 

surface  is  coated  with  magnetic  mater* ai.  It  provides  moderately  rapid 
random-access  storage.  It  is  expensive. 

\2.)  Magnetic  Disk:  is  a  stack  of  rapidly  rotating  flat  disks, 

having  their  flat  surfaces  coated  with  magnetic  material.  It  provide  a 
moderately  rapid,  random-access  storage  It  is  moderately  expensive. 

(3)  Magnetic  Tape:  is  a  steel  or  plastic  tape  coated  with  mag¬ 

netic  material  and  wound  on  a  reel.  It  provides  slow  sequential-acesa 
storage  (however,  data  can  be  arranged  initially  —  by  card-sorters  --  or 
later  --  by  the  computer  itself  --  so  that  long,  slew?  random  searches  will 
selu.m  be  necessary.  It  is  inexpensive. 
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6.1.4  CENTRAL  PROCESSING  UNITS  (CPI' 

The  central  processor  (CPU)  of  he  computer  system  normally  consists 
of  the  main  storage,  arithmetic  u-  .t,  conrrol  unit,  and  special  register 
groups.  It  is  the  principal  unit  of  the  computer;  it  controls  the  proc¬ 
essing  routines,  perfr- jjb  the  arithmetic  functions,  and  maintains  a  quickly 
accessible  memory.  The  design  characteristics  of  a  computer  are  most 
noticeably  reflected  by  the  CPU.  CPU  considerations  of  the  MOD  system  which 
affect  the  selection  of  a  computer  deal  with  the  processing  commands  (steps/ 
designed  into  the  computer,  the  processing  speed,  and  the  fnput-output 
interfacing.  Some  computers  are  much  more  appropriate  ?  nan  others  for  solu¬ 
tion’ of  particular  problems,  due  to  the  amount  of  main  memory  and  the  in¬ 
ternal  processing  speed  (influenced  by  the  complexity  of  the  machine 
commands).  The  amount  of  input-output  performed  can  greatly  influence 
operation  unless  techniques  are  provided  to  avoid  interruption  of  processing 
while  input-output  operations  are  performed. 

6.1.5  AVAILABLE  SERVICES 

There  are  three  *avs  in  which  the  MOD  project  staff  could  satisfy  its 
computer  requirements 

(I;  Purchase  its  own  computer. 

(.I)  Rent  or  lease  a  computer  to  be  installed  on 
che  AFIP  premises. 

(3)  Rent  time  on  a  computer  installed  on  another 
organization's  premises  feither  a  nearby 
government  agency,  or  a  cooperating 
university  such  as  the  University  of 
Illinois).  The  Computer  Sharing  Exchange 
ipart  of  tl,e  General  Services  Administration) 
maintains  a  record  of  all  government  computers 
in  the  Washington,  D.C.  area  on  which  time 
would  be  available  at  a  nominal  cost.  Com¬ 
puter  manufacturers  also  represent  a  potential 
source  of  computer  time  in  the  Washington, 

D.  C.  ere a. 
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There  are  several  types  of  languages  available  for  use  in  programming 
the  MUD  system.  The  choice  is  somewhat  dependent  upon  the  computer  hard¬ 
ware  selected  because  many  languages  have  been  implemented  for  only  a 
United  number  of  compute:  s.  In  general,  low  level,  or  assembly  languages 
will  not  be  used,  as  the  time  and  effort  involved  in  programming  in  these 
languages  is  greater  than  that  requ  ed  when  a  more  universal  compiler 
language  is  used.  The  most  widely  used  and  generally  applicable  of  the 
higher  level  languages  are  described  below. 

6.2.1  AVAILABLE  LANGUAGES 

Each  of  the  available  langua6es  was  developed  by  an  individual  or  a 
group  to  satisfy  a  very  general  need  in  a  particular  t>,e  of  application. 

It  is  for  this  reason  we  must  consider  several  of  the  higher  level 

languages.  These  languages  are  procedure-oriented  and  machine-independent. 
None  of  these  languages  can  be  executed  directly  by  present  computers  with¬ 
out  first  being  "processed"  into  machine  language,  but  this  is  their  ad¬ 
vantage,  This  design  allows  them  to  be  implemented  for  a  variety  of  com¬ 
puters  with  basically  no  changes  in  the  language  itself. 

(1)  ALGOL :  was  developed  in  Europe  to  be  an  internationally  accepted 
procedure  for  designing  mathematical,  engineering,  and  scientific  problems. 
Compatible  standardization  and  understanding  of  problems  and  procedures  to 
be  used  with  or  without  computers  were  the  primary  objectives  in  developing 
ALGOL.  The  language  provides  precise  instructional  statements  and  ways  of 
expressing  problem-solving  order  and  procedure.  The  ALGOL  language  (or 
abridged  subsets  of  it)  is  currently  available  tor  a  limited  number  of 
computers . 

(2)  COBOL:  was  developed  by  a  consortium  of  computer  manufacturers 
and  users  (including  groups  within  the  Federal  Government ) .  It  grew  out  of 
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a  desire  for  a  language  tnaf.  would  be  a  "shorthand”  for  computer  instruc¬ 
tion,  yet  derived  from  English.  It  resembles  English,  so  the  programmer 
can  work  with  it  easily  without  having  to  learn  many  special  symbols  and 
codes,  and  special  rules  for  using  them.  Instructions  written  in  COBOL  are 
sentences  that  are  meaningful  even  to  the  casual  reader.  The  COBOL  language 
is  capable  of  describing  business  pr^lems  of  many  kinds  and  can  easily 
specify  the  basic  steps  required  to  solve  them. 

(3)  FORT  PAN :  was  developed  by  IBM  and  is  presently  the  most  widely 
used  language  for  scientific  problems  and  programs.  It  was  developed  in 
the  United  States  in  parallel  with  ALGOL  (in  Europe).  Both  languages  are 
ar tempts  to  provide  a  programming  language  similar  to  everyday  mathema¬ 
tical  notation  so  that  engineers  and  scientists  can  avoid  the  repetition 
and  drudgery  of  machine  programming  —  and  both  succeed.  Newer  versions 
of  FORTRAN  include  features  formerly  available  in  ALGOL  alone.  The  gram¬ 
mar,  symbols,  rules,  and  syntax  used  are,  for  the  most  part,  easily  learned 
since  they  f  ’low  conventional  mathematical  and  English-language  usage,  but 
the  instructions  mast  be  explicit. 

6.2.2  AVAILABLE  SERVICES 

Ihere  are  two  ways  in  which  the  MOD  project  could  obtain  the  services 
necessary  to  implement  the  required  programs: 

(1)  Hire  its  own  programming  staff. 

(2)  Contract  the  programming  tasks  to  a 
professional  data-processing 

or  /anization, 

6.3  CONCLUSIONS 


The  MOD  system's  output  can  and  usually  will  be  in  the  form  of  maps 
drawn  off-line  by  an  ink-on-paper  plotter.  For  making  interim  maps,  a 
high-speed  printer  or  a  CRT-microf ilm  plotter  could  be  used,  however, 
because  of  the  limited  selection  of  characters  available  on  the  former. 


6  -  13 


MAPPiiiG  OF  DISEASE 


and  the  limited  precision  (on  a  single  plot)  of  both,  maps  produced  by 
either  of  these  devices  are  likely  to  be  of  significantly  lower  quality 
than  those  produced  by  an  ink-on-paper  plotter.  Other  output  media  are 
either  inapplicable,  too  inflexible,  too  slow,  or  too  expensive  for  our 
purposes. 

Out  o.t'  the  vast  array  of  possible  input  devices,  it  seems  most 
practical  for  the  MOD  system  to  adopt  the  widely-used  punched  cards  and 
magnetic  tape  for  input,  chough  possibly,  digitizers  may  ptcwe  useful  to 
input  data  which  is  already  in  Jaap  form.  The  nearly  universal  use  of 
punched  cards  and  magnetic  tape  has  resulted  in  a  substantial  body  of 
equipment  and  experience  which  will  be  of  great  value  in  handling  these 
two  input  media.  The  other  media  are  not  applicable  in  this  project  be¬ 
cause  they  are  not  sufficiently  flexible,  or  too  or  not  suffi¬ 

ciently  developed  to  be  practical  at  this  time. 

At  present,  we  believe  that  sequential-access  (magnetic  tape) 
storage  will  be  adequate  for  the  MOD  system  initially  (in  addition  to  the 
direct-access  main  storage  element  of  the  computer  itself).  Later,  however, 
it  may  prove  desirable  to  add  random-access  (preferably  magnetic  disk) 
storage  to  the  system.  The  size  of  main  memory  is  a  nuch  more  important 
factor  in  mapping  requirements  than  it  is  in  data  storage  and  retrieval 
requirements  for  the  following  reason.  In  order  to  construct  contoured  or 
shaded  maps  a  grid  roust  be  employed.  A  very  general  trend  map  can  be  pre¬ 
pared  by  utilizing  e  10  x  10  grid  (100  points),  alternatively,  a  fairly 
detailed  map  can  be  prepared  by  utilizing  a  100  x  100  grid  (10,000  points). 
Since  each  point  consists  of  three  values  (X,¥,  and  Z),  and  one  computer 
word  is  required  for  each  value,  main  memory  must  contain  at  least  30,000 
words  to  produce  a  "more  detailed"  map.  This  requirement,  plus  the  main 
memory  requirement  for  storage  of  the  computer  program  itself,  brings  the 
total  main  memory  requirement  to  approximately  50,000  words. 
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The  tasks  to  be  performed  by  the  CPU  in  the  MOD  system  are  primarily 
of  &  logical  rather  than  a  mathematical  type.  While  speed  is  not  a  prime 
factor  la  the  information  storage  and  retrieval  tasks,  it  is  a  factor  In 
the  processing  of  the  hundreds  of  thousands  of  data  points  which  are  re¬ 
quired  in  mapping.  This  means  that  the  computer  selected  must  represent 
a  compromise  between  one  designed  for  information  storage  and  retrieval 
tasks  (usually  a  small,  slow  machine)  and  one  designed  for  general  scien¬ 
tific  tasks  (usually  larger  and  faster).  Alternatively,  two  types  of  com¬ 
puters  could  be  selected  —  one  for  performing  the  information  storage  and 
retrieval  tasks,  the  ether  for  performing  the  mapping  tasks.  (During  the 
early  efforts  to  implement  the  MOD  system,  this  alternative  method  was 
used  to  good  advantage.) 

The  computer  time  used  in  the  design  and  implementation  of  the  fDD 
project  was,  for  the  most  part,  rented  on  available  computers  or  obtained 
(gratis)  from  the  Computer  Sharing  Exchange.  Time  was  rented  from  the 
Control  Data  Corporation  in  order  to  use  their  contour  mapping  program  sys¬ 
tem.  tWe  used  the  CDC  3600  and  16Q-A  computers  and  the  CalComp  564  plotter.) 
We  were  permitted  to  use  several  grovernment  computers  on  a  non-interference 
basis,  including  the  IBM  7090  of  the  Strategy  and  Tactics  Analysis  Group 
(STAG),  the  IBM  7094  at  the  National  Aeronautics  and  Space  Administration 
(NASA),  the  IBM  7090  at  the  Naval  Command  Systems  Support  Activity 
(NAVCOSSAC) ,  and  the  CDC  3100  at  the  Naval  Oceanographic  Office  (NAVOCEANO)  — 
and  we  are  most  grateful  for  this  opportunity.  Computer  programs  were  pro¬ 
vided  by  the  Kansas  Geological  Survey  and  NAVOCEANO.  Some  maps  were  produced 
for  us  by  the  University  of  Michigan  on  their  IBM  7090.  We  also  wrote  some 
of  our  own  programs,  and  these  were  used  at  NASA,  and  NAVCOSSAC,  and  at  the 
AFIP  computer  center  (which  contains  an  IBM  360/30). 

Software  requirements  can  be  met  by  any  of  the  systems  described, 
but  there  are  other  factors  to  be  considered  —  see  6.2.1.  For  example, 
ALGOL  is  perhaps  the  least  available  language  (in  the  United  States,  but 
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not  in  Europe).  COBOL  is  an  easy  language  in  which  to  work,  but  does  not 
have  the  scientific  capability  of  FORTRAN.  On  the  other  hand,  FORTRAN 
has  a  rather  limited  data  processing  capability,  especially  the  versions 
implemented  by  IBM.  CDC  FORTRAN  has  perhaps  the  best  overall  capability 
for  programming  the  MOD  system,  but  is  only  available  for  CDC  computers. 
Most  programs  which  we  borrowed  or  purchased  for  map  construction  were 
written  in  FORTRAN.  (The  exception  was  at  the  University  of  Michigan 
where  the  MAD  language  is  used) . 

We  have  utilized  both  methods  of  obtaining  programming  services: 
by  hiring  H.M.Kline,  a  computer  analyst-programmer,  and  by  contracting 
with  Planning  Research  Corporation  for  programming  (as  well  as  for 
system  analysis  and  design) . 

Conclusions  from  system  analysis  indicates  that  the  MOD  computer 
system  should  be  capable  of  performing  the  following  functions: 

(1)  Input  and  edit  data. 

(2)  Generate  dal  files  employing  the  input  data. 

(3)  Input  and  edit  queries. 

(4)  Retrieve  disease/environmental  information  f-om 
the  data  files  based  on  the  query  set. 

(5)  Perform  high-speed  sorts. 

(6)  Calculate,  using  mathematical  functions. 

(7)  Generate  commands  for  an  automatic  data- 
plotting  device. 

(8)  Generate  auxiliary  hard-copy  (printed  reports). 

(9)  Display  contents  of  any  portion  of  the  d"ta  files. 

Th«_se  requirements  indicate  that,  a  medium-  to  large-scale  computer 
is  required  for  the  final  system.  Design  studies  have  shown  ways  by 
which  an  interim  implementation  can  be  carried  out  on  a  small-scale  com¬ 
puter  (such  as  the  IBM  360/30  at  AFIP),  requiring  only  map  generation  to 
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be  performed  on  a  larger-acale  system  (such  as  the  IBM  7090) ,  because 
existing  programs  require  such  a  computer  configuration.  It  would  be 
possible  to  convert  all  programs  developed  for  such  an  interim  system 
(if  they  were  written  in  COBOL  or  FORTRAN)  into  a  common  system  for  use 
on  a  large-scale  computer.  An  off-line  plotter  can  be  used  to  produce 
all  maps  —  and  such  plotters  are  readily  available  in  the  Washington 
area. 
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ABSTRACT  -  This  section  reflects 
the  System  design  phase  of  the  MOD 
system.  It  considers  in  ietai'l  the 
various  subsystems: 

St oz  age  subsystem 
Retrieval  subsystem 
Synthesis  subsystem 
Output  subsystem 

discussing  their  structure  and  *heir 
function.  Flow  diagrams  art  presented. 


.  .a  discovery  is  nothing  more  than  the  union 
of  two  or  more  truth  ,  to  a  useful  end." 

f 

Ramon  Y.  Cajal 
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7.0  GENERAL  CONSIDERATIONS 

Data  Processing  is  commonly  defined  as  the  "rearrangement  anx  refine¬ 
ment  of  raw  data  into  a  form  suitable  for  further  usr"  or  "any  p  c°dc  -*  f  >r 
receiving  information  and  producing  a  specific  result"  (b  pp’,  Moc.  .  88) 
Automatic  data  processing  is  "data  processing  performed  by  u  system  of  ele  - 
tronic  or  electrical  machines  so  interconnected  and  interact  <  as  to  re¬ 
duce  to  a  minimum  th-'  need  for  huro^n  assistance  or  intervene  ion"  (D  na¬ 
tion,  1966,  p.  40)  Thus  Data  Pr  cessing  includes  all  ope  rat 4  -  ne  sa  y 

to  produce  the  de  Lred  output  (r* suits)  from  the  available  input  Mata) 
utilizing  the  selected  c  mp^ter  tardware.  These  operations  include  what  er 
manual  interfacing  with  .he  co  puter  xS  required  to  input  data,  corrections, 
end  requests  into  the  syst  m.  .s  well  as  those  tas>xs  which  can  bt  oerfo.  ad 
by  the  machines  under  the  di„  ct  on  of  suitable  infernally  ston.  !  , 'ogr^uia 

The  information  storage  and  retrieval  (ISt,R)  portions  of  tne  MOD  sys¬ 
tem  had  been  completely  designed  at  the  time  when  work  on  the  project  as 
terminated.  Because  of  this,  the  techniques  for  building,  su  ing,  main¬ 
taining,  and  retrieving  the  MOD  data  could  be  specific-  ,  d'spite  sem<  un¬ 
resolved  input  and  output  problems,  since  these  latter  asj.  is  are  independ¬ 
ent  of  explicit  input  and  output  considerations  once  the  ot  »nt ial  elements 
of  a  computerized  system  have  been  determined. 

In  the  MOD  system  the  essential  element  is  the  formulation  'f  data 
points,  an  important  part  of  wtiich  consists  of  a  LOF/MOi  stt  cture  For  th< 
purpose  of  MOD  data  processing,  LOG,  VAL,  and  MAR  of  a  data  point  c  n  all  he 
treated  in  essentially  the  same  manner  as  MOF's.  These  data  points  must  be 
stored  and  retrieved  in  the  MOD  system  regardless  of  the  manner  it.  which 
extrinsic  problems  may  ia.er  be  resolved.  K.- reaver,  all  of  the  possible 
mapping  problems  which  might  be  encountered  in  using  the  MOD  system  cannot 
be  anticipated  until  an  attempt  is  made  co  produce  maps  by  using  actual  MOD 
data.  in  particular,  the  acceptability  of  existing  computer  mapping  methods 
and  programs  cannot  be  ascertained  a  priori  .  The  r.TF/MOF  structure  of  the 
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Input  (and  Implicit  in  retrieval  re,  ests)  is  open-ended,  not  only  to  afford 
maximum  flexibility  to  the  system,  but  also  because  not  all  pertinent  de¬ 
scriptive  elements  can  be  predicted  at  the  outset. 

In  the  MOD  system  it  is  anticipated  that  additions  of  new  MOF's  and 
redefinitions  of  existing  MOF's  and  LOF's  are  quite  likely.  Precis?  require¬ 
ments  for  such  restructuring  will  become  apparent  only  when  actual  data  are 
used  as  input  into  the  system  and  real  retrieval  attempts  a^e  conducted. 
Therefore,  in  a  modular  approach  to  the  design  and  implementation  of  the 
entire  MOD  system,  the  IS&R  subsystems  represent  a  logical  building  block 
and  test  tool  for  the  remaining  facets  of  the  system. 

Because  of  this,  the  following  sections  provide  detailed  design  speci¬ 
fications  for  whe  Storage  and  Retrieval  Subsystems  while  the  Synthesis  and 
Output  Subsystems  are  treated  in  a  more  general  fashion.  The  descriptions 
of  the  first  two  subsystems  contain  specif ications  for  their  immediate  pro¬ 
gram  design  and  implementation.  These  subsystems  have  been  designed  so  that 
scbsequent  modification  to  them  should  be  unnecessary,  however,  the  ration¬ 
ale  for  the  techniques  and  methods  utilized  in  those  subsystems  are  given  so 
that,  if  cuanges  seem  desirable  it  will  be  easier  to  evaluate  their  feasi¬ 
bility  --  and  complexity. 

The  designed  system  is  applicable  to  either  a  magnetic  tape  or  disk 
computer  configuration,  but  special  considerations  were  given  to  the  ad~ 
di  ional  processing  which  would  be  required  in  a  tape  system  since,  during 
the  design  phase,  it  appeared  that  system  implementation  would  be  with  uon- 
rand<  -  access  files. 

he  formats  for  the  various  input  cards  are  provided  not  only  to  com¬ 
plete  the  design  specifications  of  the  Storage  and  Retrieval  Subsystems, 
but  also  in  order  that  the  data  can,  if  desir-.d,  be  collected  and  trans¬ 
cribed  onto  ^ards  sinu ltaneousiy  with  future  program  development. 
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Two  functions  of  the  Dictionary  File  are  so  intimately  involved  in  the 
synthesizing  operations  necessary  to  produce  reports  and  maps  from  MOD  data 
that  discussion  of  these  functions  is  deferred  until  the  Synthesis  Subsystem 
' f»  described,  although  these  two  functions  -  gazetteer  and  grid  -  couj  1  also 
be  considered  logically  under  the  Storage  Subsystem. 

One  of  the  most  important  programs  to  be  written  eventually  is  a 
control  program  which  will  coordinate  the  operations  of  the  various  MOD  sub¬ 
systems  .  This  control  program  will  read  all  the  control  information  and 
determine  the  proper  subsystems  to  be  called  in  at  the  appropriate  times. 

It  will  minimize  possible  procedural  errors  (and  the  necessity  for  computer 
operator  intervention)  and  maximize  efficiency  of  total  system  operation. 
Design  of  the  control  program  will  be  based  upon  the  finished  subsystems, 
for  which  reason  this  program  will  be  the  last  one  to  be  designed  and 
implemented . 

The  various  components  of  the  MOD  system  design  are  graphically  sum¬ 
marized  an  the  overall  functional  chart  of  the  system  (Fig.  7.1). 

Concrete  examples  are  provided  wherever  possible  to  demonstrate  and 
amplify  the  abstract  discussion.  These  examples  are  accurate  and  realistic 
for  Illustrative  purposes ,  but  they  are  not  necessarily  exhaustively  com¬ 
plete,  lest  they  become  unmanageable . 

7.1  STORAGE  SUBoYSTPh 

7.1.1  DATA  INPUT  CARDS 

The  data  contained  on  the  data  extraction  forms  are  entered  into  the 
MOD  system  by  means  of  punched  cards.  An  attempt  has  beer,  made  to  allow  the 
data  to  be  keypunched  as  it  appears  on  this  extraction  form,  with  as  few- 
additional  instructions  to  the  keypunch  operators  as  possible.  The  pre¬ 
printed  MOP  designation,  including  its  surrounding  parentheses,  is  punched 
for  each  MOF  utilized.  (These  parentheses  makes  the  cards  more  readable 


Punched  card  bearing,  for  the  LOT  "Isolation  from  urinec  in  the  MOF 
“Method  of  diagnosis.  ”  tha  LOF  coda  number  designation 


r>  8  l  which  consists  ef 

I  !  I — — . - 


— ~  Checksum  (Luhnt  Digit 

- MOF  Cods  Digit 

- -  LOF  Reference  Number 


St  rage  Subsystem  verifies  tha*  8  is  the  proper  MOF  code  digit  for  this 
particular  MOF  by  checking  Dictionary  File 


Storage  Subsystem  changes  the  number  58  \ ,  by  usin^  the  programed 
algorithm  "  substitute*  value- substitute- value- substiti  te-etc.  .  H  and  the 
prestcred  table 

[  Actual  1 

{  Value  cf  0  1  2  345  bj?89 

Digit  _L_ 

Number  • 

to  be  9876543  210 

Substituted  !  | 


at  '-'’■''•s  5  8  i 

substitute  Slue- substitute 
io  4  8  8 


Storage  Subsystem  adds  the  digits  of  the  new  number  488  together 


(4*8-*  8}  to  obtain  sum  of  20. 


Fiqure  7-[ 


Manner  in  which 


preprinted  IDF  code  numbers 
function  in  automated  validity 
checking  during  MOD  data  input , 


Storage  Subsystem 
tests  this  sum  to  see  if  it  is 
a  multiple  e.f  10 


/  If  sum  is  a  \ 
f  multiple  o7  10,  I 
Ref.  No.  5  is  deemed  j 
l  valid  and  1*  stored  j 
V  in  MOD  Data  File.  / 
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,  U  oum  is  not  a  multiple  of  10, 
J  the  numheTTs  deemed  invalid 
»  or  incorrect,  and  is  rejected 
\  and  output  a.-  an  error. 


^ “ 


f  manual  verif icatior. . )  Each  LOF  associated  with  this  MOF  is  then  punched 
as  it  appears  on  the  data  form.  A  listing  of  some  MOD  data  input  cards  has 
already  been  given  (Fig.  5.8). 

LOF’s  may  appear  on  the  extractic  form  as  preprinted  code  nuuoers, 
written  numeric  quantities,  written  code  numbers,  and  as  written  textual 
spellings.  For  those  MOF's  containing  LOF’s  which  are  numeric  values  (x.e., 
"quantitative"  LOF's),  the  numerals  are  always  handwritten  on  the  form. 
However,  for  MOF's  which  contain  an  open-ended  set  of  predefined  LOF's  (alpha 
be  tic  or  "qualitative"  LOF's),  the  LOF’s  may  appear  either  as  the  preprinted 
reference  number  appearing  on  the  data  form,  as  an  additional  reference  num¬ 
ber  supplied  by  the  data  analyst,  or  as  the  textual  spelling  of  the  LOF 
written  by  the  data  extractor.  In  order  to  minimize  the  possibility  of  other 
wise  undetectable  keypunch  errors,  the  preprinted  LOF  numbers  actually  con¬ 
sist  of  the  LOF  reference  number,  a  MOF  code  digit,  and  a  checksum  (Luhn) 
digit;  this  is  shown  in  Fig.  7.2.  In  those  MOF's  whicf  may  be  specified  by 
several  LOF's,  (whic..,  hereafter,  are  called  nulti-LOF  MOF's),  the  LOF  in¬ 
dications  on  the  data  form  will  contain  preprinted  commas  which  will  also  be 
keypunched  onto  the  input  cards  to  separate  these  LOF's.  For  MOF's  which 
must  be  specified  by  a  single  LOF  (w’’ich,  hereafter,  are  termed  single-LOF 
MOF's),  there  will  be  no  preprinted  commas.  Vague  or  questionable  data  may 
be  marked  with  a  "?"  on  the  data  extraction  form  and  such  question  marks 
will  be  keypunched  immediately  after  the  pertinent  LOF's. 

These  data  input  cards,  which  are  to  be  used  for  both  initial  data 
entry  an.  sequent  data  u*» tncenai.ce,  nave  the  follo»,...6  r  ormat : 


-  see  next  page  - 
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Data  Point  Number 

Year  of  Extraction  1-2 

Month  of  Extraction  3-4 

Day  of  Extraction  5-6 

Extractor's  Identification  (EID)  7-9 

Data  Point  Number  (that  day)  It  12 

Card  Type  (if  required)  13-13 

Data  Field  14-80 


The  data  field  contains  the  MOF  and  LOF  data  in  free  form,  hence  it 
has  no  predetermined  subfields.  Blanks  not  embedded  within  a  textual  spell¬ 
ing  of  a  LOF  are  optional  between  entries.  Thus  any  card  eould  contain  only 
one  MOF  and  one  LOF,  or  as  many  as  eight  MOF's  if_  each  MOF  contained  only 
one  preprinted  LOF  code  designation. 

Although  one  ca^d  can  be  punched  without  special  instructions  to  the 
keypunch  operator,  special  instructions  are  necessary  to  handle  continua¬ 
tion  cards  for  a  data  point.  These  instructions  (rules)  also  reduce  the 
amoun^  of  preliminary  processing  required: 

(1)  Each  card  must  contain  the  data  point  number 
and  card  type  in  columns  1-13. 

(2)  A  LOF  must  be  entirely  contained  on  a  single 
card  (whether  i  meric.,  code  reference  number, 
or  textual  spelling). 

(3)  If  a  new  LGF  of  a  previously  designated  MOF 

is  to  be  placed  ou  a  different  input  card,  the 
MDr  designation  must  be  repeated. 

These  requirements  limit  the  length  of  a  LOF  to  62  characters  (67 
characters  in  the  data  field  minus  5  characters  for  the  MOF  designation). 

As  shall  become  evident,  this  size  limitation  proves  convenient  for  the 
Dictionary  File.  Note  that,  for  input  purposes,  NAR  (narrative)  of  a  data 
point  can  be  treated  as  another  MOF  —  but  the  "MOF"  for  NAR  has  no  size 
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restrictions.  It  is  suggested  that  each  narrative  it'd  contain  a  continua¬ 
tion  cird  number  in  the  data  field  immediately  pret  ding  the  MQF  designa¬ 
tion  (i.e.,  starting  in  card  column  14). 


File  maintenance  cards  have  the  same  format  as  the  original  data  cards. 
The  type  of  maintenance  to  be  performed  is  indicated  i the  card-type  field, 

. nrd  column  13,  by  the  following  codes: 

D  -  Delete  all  MOF's  indicated  on  the  card,  or  if 
no  MQF  is  indicated,  delete  the  entire  data 
point  record. 

R  -  Replace  the  LOF’s  of  the  indicated  MOF  with 
the  listed  LOF's.  This  operation  is  a  strict 
replacement;  if  only  one  of  a  series  of  LOF's 
for  a  MOF  is  to  be  changed,  all  of  the  im- 
nutable  LOF's  must  also  be  indicated  on  the 
replacement  card  if  these  changes  are  to  be 
effected  <u  one  computer  pass, 

A  -  Add  the  designated  LOF  to  the  indicated  MOF, 

If  a  LOF  already  exists  for  tnis  MOF,  the 
new  LOF  will  be  added  to  the  existing  LOF('s) 
if  the  MOF  may  have  several  LOF’^.  A  blank  in 
the  card-type  field  is  "tilised  on  initial  entries. 

Each  card,  obviously,  cau  contain  only  one 
maintenance  code  although  several  MOF's  may  be 
specifieu  on  the  one  card. 


From  this  it  is  seen  that  a.ll  data  cards  contain  a  MOF  designation  in 
card  columns  14-18,  with  the  possible  exception  of  narrative  (NAR)  cards. 
All  types  of  cards  may  contain  as  many  or  as  few  MOF's  and  LOF's  as  are 
ant  with  the  rules  given  for  continuation  cards. 


Normally,  file  maintenance  and  creation  of  a  given  data  point  will  be 
performed  at  separate  points  in  time.  However,  if  several  types  of  entries 
are  processed  at  the  same  time  for  the  same  data  point,  they  will  be  con¬ 
sidered  in  the  following  order  of  p-‘*cedence : 

(1)  Delete;  (2)  Replace;  (3)  Add;  (4)  Initial  Entry 
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All  of  the  MOD  input  data  which  has  been  accepted  by  the  MOD  system 
are  stored  in  the  Data  File.  This  file  consists  of  one  logical  record  for 
each  data  point.  Each  record  contains  the  data  point  number  and  all  MOF's 
and  LOF'fs  which  pertain  to  that  data  point.  Even  in  coded  form  these  data 
are  quite  variable  since  there  are  common  and  optional  MOF's,  since  some 
MOF's  may  contain  several  LGF's,  and  since  the  data  may  include  an  unspeci¬ 
fied  amain*"  of  narration  (NAR) .  For  these  reasons  it  is  impractical  to 
utilize  fixed  length  records  for  a  data  point. 

All  IOF's  (except  material  appearing  under  rhe  narrative,  NAR)  ‘ 1 1 
be  represented  in  the  Da* a  File  by  numbers.  A  will  be  appended  to  any 
LCF  entry  fo  •  which  the  input  data  was  so  markea.  The  actual  values  of 
quantitative  LOF's  will  be  used,  however,  as  shall  be  seen  in  the  next  sec¬ 
tion,  qualitative  LOF's  will  be  represented  by  a  code  number  the  size  of 
which  depends  upon  the  number  of  levels  in  its  generic  tree  structure. 

Each  EOF  must  be  associated  with  its  appropriate  M0F .  This  could  be 
accomplished  in  several  ways ,  e.g.,  each  L0F  or  group  of  LOF's  could  be 
immediately  preceded  in  the  record  by  its  MOF  designation.  There  is  a 
serious  disadvantage  to  this  solution  because  the  entire  record  would  have 
to  be  searched  to  locale  any  given  MCF .  A  more  desirable  method  is  to 
create  an  index  within  each  record  which  would  establish  the  relative  loca¬ 
tion  (within  the  record)  of  the  first  L0F  for  each  MOF.  This  world  require 
that  the  length  assigned  to  each  EOF  be  provided  since  length  varies  from 
MC'  to  MOF.  The  index  itself  could  be  either  fixed  or  of  variable  length 
since  not  all  MOF's  are  present  in  each  data  point  {but  sufficient  loca¬ 
tions  could  be  set  aside  to  provide  for  sll  presently  defined  MOF's  in  the 
index).  A  fixed  length  index  would  reduce  somewhat  searching  requirements, 
however,  many  programs  of  the  MOD  system  would  have  to  be  updated  in  order 
to  process  a  new  index  structure  when  new  MOF's  were  added  to  the  diction¬ 
ary.  Because  of  this  complication  wc  have  chosen  to  make  the  index 
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variable  and  to  consist  of  the  MOF  designation,  relative  starting  location 
and  length  of  LOF  code  for  each  MOF. 

In  general,  the  MOF  designations  will  be  arranged  in  alphabetical 
order  in  the  index,  and  the  LOF  representations  will  also  be  sequenced  in 
this  MOF  order.  The  narrative  (NAP.)  should  appear  last  in  the  logical  record 
and,  perhaps,  even  in  a  separate  physical  record.  It  would  be  desirable  to 
place  certain  fixed-length  single-LQF  essential  MOF’s  (e.g.,  the  geograpn’- 
cal  location  (LOC)  and  the  value  (VAL)  for  the  data  point),  in  a  fixed  loca¬ 
tion  within  each  data  point  record  for  facility  in  sorting  and  other  mani¬ 
pulations.  If  this  were  a ^ie  1;  would  not  be  necessary  to  include  these 
MOF's  in  the  index. 

The  format  of  each  data  point  logical  record  can  then  be  described  as: 

(1)  Data  Point  Number  —  fixed  location,  format, 
and  length. 

(2)  Predetermined  essential  MOF's,  LOC,  and  VAL  — 
fixed  location,  format,  and  en.gth. 

(3)  Record  index  of  other  MOF's  —  fixed  starting 
location  and  format,  but  variable  length. 

(4)  LOF's  —  variable  starting  location,  format, 
and  length. 

(5)  Narrative  (NAR)  —  variable  starting 
location,  format,  and  length. 

Design  of  the  MOD  system  has  been  based  largely  upon  two  disease 
models"  for  reasons  discussed  in  Sections  1  and  2.  However,  virtually  an 
unlimited  number  of  diseases  could  be  processed  by  the  system.  This  woulu 
require  only  design  of  new  data  extraction  forms  and  the  selection  and  defi- 
i  f ion  of  new  MOF's  and  additional  LOF's  (even  for  previously  existing 
MOF's).  There  would  seem  to  be  no  requirement  for  maintaining  a  different 
dictionary  for  each  disease  although  it  might  prove  desirable  to  place 
different  disease  data  in  separate  data  files. 
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As  in  the;  case  of  diseases,  an  almost  unlimited  variety  of  environ¬ 
mental  data  could  be  processed  by  the  MOD  system  —  with  appropriate  new 
data  extraction  forms,  MOi-'s,  and  LOF's.  The  present  system  is  capable  of 
"•rawing  environmental  maps,  but  *  ill  seldom  be  used  to  do  thet.  Instead, 
the  scale  and  projection  of  MOD-produced  disease  maps  will  be  adjusted  to 
correspond  with  those  of  existing  environmental  maps  so  that  the  one  may  be 
readily  compared  with  the  other.  Environmental  factors  extracted  along  with 
disease  data  will  be  considered  only  with  the  data  point  for  which  they  are 
included  as  MOF's  hence  their  output  capacity  (under  these  conditions)  will 
be  restricted  to  retrieval  functions.  However,  a  data  file  of  environmental 
factors  could  be  built  from  either  or  both  the  input  disease  data  and  that 
derived  from  separate  environmental  data  extraction  forms.  Environmental 
data  points  generated  by  the  former  means  i  lid  contain  the  associated 
medic  ;1  data  point  number;  those  by  the  latter  would  not  be  associated 
directly  with  particular  disease  data  points.  In  additior,  it  would  be 
possible  t .?  produce  single  environmental  factor  files  by  techniques  which 
would  digitize  existing  maps. 

From  these  considerations  it  is  evident  that  environmental  maps  could 
be  produced  from  'he  MOD  data  files  in  which  each  data  point  was  obtained 
from  environmental  data  (if  present)  or  disease  data.  In  a  retrieval  which 
included  both  environmental  and  disease  conditions,  the  user  could  specify 
that  only  those  factors  explicitly  a«sociateu  with  the  disease  should  be 
considered.  On  the  other  hand  he  could  broaden  his  retrieval  to  include 
corresponding  factors  from  the  environmental  files  or  even  those  from  other 
disease  files. 

As  an  illustration  of  the  foregoing,  if  the  Data  File  contained 
internal  codes  equivalent  to  the  following  data: 
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O/U828JDS004  (LOC) 
(VAL) 
(TIB) 
(TIE) 
(SEA) 
(TOD) 
(SS2) 
(LSZ) 
(SDA) 
(MDG) 
(NAR) 

670828JDS005  (LOC) 
(VAL) 
(SDA) 


pope  co. ,  Illinois 

621225 

6.30228 

OtnA/fn 

uunri&K 

MORNING 

10 

17 

L.  GRIPPO,  L.  BALLUM 
ISOLATION  FROM  TISSUE 

TRANSMSN  POSS  PREDATION  ON  FERAL  HOUSE-MOUSc, 
TITERS:  LBALL-1: 100 
JOHNSON  CO.,  ILLINOIS 
12 

LEPTOSPIRA 

— . . .  . . -  then  the  following  cajcda: 


670828JT>S004R(TOD)  DAWN,  DUSK  (SSZ)  9  (SEA)  WINTER 
670828JDS004R2 (NAR)  TITERS:  LHYOS-1. 1000;  LBALL-1: 100 
670828JDS004A(SDA)  L.  HYOS 
670828JDS005D 

670829JDS001  (LOC)  MASSAC  CO.,  ILLINOIS 
(.VAL)  5 
(TIB)  63 
(TIE)  63 

(TOD)  AFTERNOON,  DUSK 
(SSZ)  37 
(LSZ)  37 

(SDA)  L.  CANICOLA 

. . ■  i  . . .  i  >  -  wouUi  cause  the  Data  File  to 

contain  data  equivalent  to: 


670828JDS004  (LOC) 
(VAL) 
(TIB) 
(TIE) 
(SF.A) 
(TOD) 
(SSZ) 
(LSZ) 
(SDA) 
(NAR) 

670829JDS001  (LOC) 
(VAL) 
(TIB) 
(TOD) 
(SSZ) 
(LSZ) 
(SDA) 


POPE  CO.,  ILLINOIS 
2 

62- 12-25 

63- 02-28 
WI,-PER 
DAWN,  DUSK 
9 

17 

L.  GRIPPO,  L.  BALLUM,  L,  HYOS 

TRANSMSN  POSS  F RE DAT  ION  ON  FERAL  HOUSE-MOUSE 

TITERS:  LHYOS-1 : 1C0O,  LBALL-1: 100 

MASSAC  CO.,  ILLINOIS 

5 

63 

AFTERNOON,  DUSK 

37 

37 

L.  CANICOLA 
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7.J.3  1--CT IONARY  FILE 


The  Dictionary  File  is  the  central  element  of  the  MOD  system.  It  is 
the  link  which  connects  input  data  to  stored  data  co  retrieved  output  data. 
This  tile  contains  the  dictionary  of  terms  which  are  allowed  a s  both  data 
descriptors  and  query  descriptors,  thus  the  dictionary  is  also  a  bridge 
between  the  environment  of  the  medical  doctor  and  tne  environment  of  the 
compute. . 

A  logical  record  in  the  Dictionary  Kile  contains  descriptors  for  a 
complete  MOF ,  i.e.,  both  the  MOF  description  and  all  its  associated  LOF 
descriptions.  The  MOF  description  consists  of  the  complete  English  lang¬ 
uage  description  (long  form)  of  the  MDF ,  and  the  abbreviated  description 
(short  form)  consisting  of  three  letters,  e.g..  Specific  Disease  Agent:  SDA. 
The  LOF  description  consists  of  the  English  language  description  of  the  LOT 
and  the  LOF  code  number. 

In  order  to  keep  the  internal  data  consistent,  but  to  allow  freedom 
of  synonymous  expression  externally,  synonyms  ana  variant  spellings  can  be 
incorporated  in  the  nlctionary  File.  Variant  spellings  will  be  corrected 
as  the  data  is  input  so  that  all  printouts  will  contain  the  preferred  form 
of  each  LOF.  Synonyms  are  keyed  to  the  preferred  form,  but  _re  carried 
internally  with  their  own  identifier.  This  permits  a  query  on  a  group  of 
synonymous  terms  or  on  the  specific  term  requested  (called  synonym  lockout) . 
As  a  further  convenience  to  the  query  requestor,  terms  which  fall  within  a 
category  are  automat  ically  provided.  Tills  is  accompli  aired  by  means  of  an 
internal  tree  structure  of  terms.  For  example,  a  query  on  "mice"  would 
yield  data  on  M"rldae,  also  on  each  of  the  members  of  the  family:  Mus 
musculuB .  Fltymya ,  etc. 

Tire  Dictionary  File  contains  all  of  the  MOF’s  and  L^F's  which  have 
been  defined  for  the  MOD  system  and  has  been  designed  for  either  magnetic, 
tape  or  disk  storage.  A  MO i  is  considered  a3  refined  in  the  M'D  system  if 
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tne  Dictionary  F  i  u  or  tains  record  for  that  MOf.  MOF'e  should  be  so  de¬ 
fined  that  no  MOF  contains  both  quantitative  and  qualitative  LOF's.  A  quanti¬ 
tative  (or  numerical)  LoF  is  considered  as  defined  if  it  has  a  val'd  numeric 
value.  A  qualitative  (or  alphabetic)  LOF  is  considered  as  defined  if  the 
dictionary  contains  a  LOF  entry  for  it. 

A  MOF  record  includes  the  rev*:uai  nailing  of  the  MOF  name  and  its 
(short  form)  MOF  designation.  if  a  nut  has  quantitative  LOF's,  the  MOF 
record  will  also  contain  an  indication  of  the  type  of  edit  checking  to  which 
its  LOF's  are  to  be  subjected.  The  validity  of  dates  and  ranges  can  be 
tested  in  numeric  LOF's.  If  a  MOF  consists  of  qualitative  LuF's,  the  MOF 
record  will  also  contain  the  MOF  code  digit  which  is  associated  with  each 
LOF  number  on  the  data  extraction  form. 

For  qualitative  MOF ' s  there  will  be  a  record  for  i  ach  LOF  thus  far 
defined  in  the  system.  This  record  will,  contain  an  indication  of  the  struct¬ 
ural  relationship  of  the  LOF  to  all  other  LOF's  with  the  MOF.  These  rela¬ 
tionships  consist  of  generic  tree  levels,  synonyms,  and  variant  spellings. 

Eaen  LOF  (within  a  MOF)  Is  assigned  a  reference  number.  This  number,  whim 
is  provided  by  Dictionary  File  listings  and  used  in  updating,  appears  on  the 
data  extraction  form  along  with  the  MOF  code  digit  and  a  checksum  digit.  The 
structural  re)  it  1  unships  are  indicated  bv  the  LOF  code  number.  This  code 
mober  is  composed  of  reference  numbers,  one  for  each  tree  level  (.the  ref¬ 
erence  numbers  for  the  (mail.)  L'r  at  each  nf  gh«*r  tree  level  plus  the  refer- 
enc"  numbei  •  nj  uhic  LOF  itself).  In  addition,  another  reference  number  is 
used  for  synonym  designation.  Thus  a  code  tuabei  consists  of  a  se* i of 
reference  numbers,  the  length  of  the  series  being  one  greater  than  the  total 
number  of  levels  in  the  given  MOF .  Although  variant  spellings  are  separate 
entries,  ti.ey  are  assigned  the  same  reierence  number  the  LOF  for  which 
they  are  a  ariaat. 

The  explicit  format  of  the  Dictionary  File  will,  of  course,  depend 
upon  the  computer  selected  and  the  available  external  storage  devices.  In 
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any  event,  the  file  should  be  structured  to  facilitate  input  and  retrieval 
of  the  data  even  though  this  will  require  additional  processing  within  the 
di or  ionary . 

Since  the  inpu^  data  will  include  both  reference  numbers  and  words, 
random  searching  within  a  MOF  record  can  be  eliminated  if  both  an  alphabeti¬ 
cal  orde^  (for  the  words)  and  a  numeric  order  (for  the  reference  numbers) 
are  maintained.  This  method  will  also  be  helpful  in  processing  word  LQF's 
of  the  retrieval  requests  and  reference  number  results  of  the  retrieval. 

These  two  sequences  within  the  dictionary  car.  be  maintained  on  a  disk  by 
separating  the  LOF  records  into  two  sections.  The  alphabetical  order  records 
would  contain  the  LOF  word  and  reference  number;  the  numeric  order  records 
would  contain  the  reference  number,  the  code  number,  and  an  index  number  to 
^...Jicate  the  location  of  the  related  alphabetic  order  record.  (There  would 
only  be  one  numeric  order  record  for  ail  variant  spellings  of  the  same  word.) 
In  this  way  utilization  of  disks  is  minimized.  If  the  Dictionary  File  is 
maintained  on  magnetic  tape,  both  records  should  contain  the  LOF  word  and 
code  number,  and  in  essentially  the  same  format  of  LOF  word,  reference  num¬ 
ber,  and  code  number.  To  minimize  the  amount  oi  processing  time  required, 
they  should  be  kept  in  different  (physical)  file''. 

The  order  of  the  MOF’s  themselves,  within  the  Dictionary  File,  is 
somewhat  arbitrary,  however,  two  important,  factors  should  be  considered, 
iirst,  the  Dictionary  File  maintenance  cards  must  eventually  be  sequenced  in 
the  same  order  as  the  Dictionary  File.  Secondly,  on  tape,  it  would  be  most 
convenient  to  place  together  an  entire  set  of  LQF's  common  to  more  i.han  one 
MOF.  (Perhaps  the  order  of  MOF's  included  on  the  data  extraction  form  could 
be  arranged  to  facilitate  this.) 

The  following  example,  which  illustrates  the  internal  structure  of  a 
MOF,  will  also  serve  an  example  of  MOF  construction.  (The  synthesis 
section  contains  a  somewhat  similar  example,  but  deals  with  geographic 
locations.)  In  this  and  following  examples,  brackets  "[  ]" indicate  variant 
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spellings  and  parentheses  11  (  )"  indicat  synonyms. 


Consider  a  MOF ,  ’’Primate  groups  involved  ir’  study  (PCS)",  composed  of 
several  1  OF 1 s  arranged  in  the  following  tree-structure,  in  which  the  under¬ 
lined  LOF's  are  to  be  added  to  the  MDF : 


s Hie  her  Primate  si 


I  Lower  primate? 


Catarrh.m  CatarSv.r.i  C  a ;  a  r  1  n  i  j  Pl.itvr rhir.i  Loire  roicea  Ta  s sioide a  i  upa. 01 flea 

(Old  World  Anthropoid  $)  /  \ 

/  \  /  \ 

pongic)>«e  Vonfliiriac  (.  e re opithec idae  Jiapabd<*e  Ohidac 

(Simiidae)  Simidac' 

{Great  Aoes) 
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On  magnetic  tape,  the  MOF,  (PGS) ,  would  appear  as  follows: 


ALPHABETIC  ORDER 


MOF  DESC 

MOF  TEXTUAL  SILLING 

i 

ENTRS 

LVL 

CODE 

(PGS) 

PRIMATE  GROUPS  INVOLVED  IN  STUDY 

13 

3 

2 

LOF  TEXTUAL  SPELLING 

T 

X. 

-OF 

CODE 

# 

— 

RET  # 

(PGS) 

ANTHROPOLDEA 

1 

0 

0 

0 

1 

(PGS 

CATARHINI 

1 

2 

0 

n 

2 

(PGS) 

CATARRH  I>  I 

1 

2 

0 

0 

2 

(PGS) 

CEBIDAE 

1 

8 

10 

0 

10 

(PGS) 

CERCOPITHECIDAE 

1 

2 

7 

0 

7 

(PGS) 

GREAT  APES 

1 

2 

4 

6 

6 

(PGS) 

HAPALIDAE 

1 

8 

9 

0 

9 

(PGS) 

LEMUROIDEA 

11 

12 

0 

0 

12 

(PGS) 

OLD  WORLD  ANTHROPOIDS 

1 

2 

0 

3 

3 

(?GS) 

PLATYRRHINI 

1 

8 

0 

0 

8 

(PGS) 

PONGIDAE 

1 

2 

4 

0 

4 

(PGS) 

PONC-IIDAE 

1 

2 

4 

0 

u 

(PGS; 

PROSIMT 

11 

0 

0 

0 

11 

(PGS) 

SIMIIDAE 

1 

2 

4 

5 

5 

(PGS) 

SIMIDAE 

1 

2 

4 

5 

3 

(PGS) 

TASSIOIDEA 

LiL. 

13 

0 

0 

13 

NUMERIC 

ORDER 

(PGS) 

PRIMATE  GROWS  INVOLVED  Lb 

STUDY 

13 

3 

2  * 

(PGS) 

ANTHROPOIDEA 

1 

0 

0 

0 

1 

(PGS) 

CATARRHINI 

1 

± 

2 

0 

0 

2 

(PGS) 

CATARHINI 

1 

2 

0 

0 

2  * 

(PGS) 

OLD  WORLD  ANTHROPOIDS 

1 

2 

0 

3 

3 

(PGS) 

PONGIDAE 

1 

2 

4 

0 

4 

(PGS) 

PONGIIBAE 

1 

2 

4 

0 

4  * 

(PGS) 

SIMIIDAE 

1 

2 

4 

3 

3 

(PGS) 

SIMIDAE 

1 

2 

4 

3 

5  * 

(PGS) 

GREAT  APES 

1 

2 

4 

5 

6 

(PGS) 

CERCOPITHECIDAE 

1 

2 

7 

0 

7 

(PGS) 

PiwVTYRRHINI 

1 

8 

0 

0 

8 

(PGS) 

HAPALIDAE 

1 

8 

9 

0 

9 

(PGS) 

CEBIDAE 

1 

8 

10 

0 

10 

(PC»S) 

PROSIMI 

11 

0 

0 

0 

11 

(PGS) 

LEMUROIDEA 

11 

12 

0 

0 

12 

(PGS) 

TASSIOIDEA 

11 

13 

0 

0 

13 

*  These  records  could  be  eliminated 

from 

the 

numeric 

order  file. 
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/.  Data  Processing 


If  the  Dictionary  File  were  maintained  on  a  disk  file,  the.  alpha¬ 
betical  order  records  for  any  MOF  would  be  similar  to  those,  for  a  tape 
file  with  the  exception  that  the  MOF  designator  would  only  need  to  appear 
in  the  MOF  description  record.  The  numeric  order  records  could  contain 
an  indication  of  the  location  of  the  textual  description  of  the  LOF  rather 
than  the  description  itself,  and  would  appear  as  follows  for  the  MOF  (PGS) 


1 

1 

0 

0 

0 

1 

3 

1 

2 

0 

0 

2 

9 

1 

2 

0 

3 

3 

11 

1 

2 

4 

0 

4 

15 

i 

2 

4 

5 

5 

6 

1 

2 

4 

6 

6 

5 

1 

2 

7 

0 

7 

10 

1 

R 

0 

0 

fc 

7 

1 

8 

9 

0 

9 

4 

1 

8 

10 

0 

10 

13 

11 

0 

0 

0 

11 

8 

xl 

12 

0 

0 

12 

16 

11 

13 

0 

0 

13 
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MAPPING  OF  DISEASE 


After  the  four  new  (underlined)  LOF’s  were  added  to  the  MSF,  (PGS), 
the  Dictionary  File  would  include  the  following  four  new  record:.?  (in  both 
alphabetical  and  numeric  order)  in  a  tape  system; 


(PGS) 

CATARINI 

1 

2 

0 

0 

2 

(PGS) 

HIGHER  PRIMATES 

1 

0 

0 

14 

14 

(PGS) 

LOWER  PRIMATES 

11 

0 

0 

15 

15 

(PGS) 

TUPAIOIDEA 

11 

16 

0 

0 

16 

The  resul*"»'',t  numeric  order  section  of  the  MOF,  (PGS),  would  appear  ns 
follows  in  a  disk  system,  where  th*'  location  references  have  been  changed 
to  reflect  additional  LOF's. 


1 

1 

0 

0 

0 

1 

4 

1 

2 

0 

0 

2 

12 

1 

2 

0 

3 

3 

14 

1 

2 

4 

0 

4 

18 

1 

2 

4 

5 

5 

7 

1 

2 

4 

6 

6 

6 

1 

2 

7 

0 

7 

13 

1 

8 

0 

0 

8 

8 

I 

8 

9 

0 

9 

5 

1 

8 

10 

0 

10 

16 

11 

0 

0 

0 

11 

10 

11 

12 

0 

0 

12 

19 

11 

13 

0 

n 

13 

9 

1 

0 

0 

14 

14 

11 

11 

0 

0 

15 

15 

20 

11 

16 

0 

0 

16 
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7.  Data  Processing 


7.1,4  DICTIONARY  INPUT  CARDS 


Dictionary  File  building  and  maintenance  consists  of  the  f ol’r^ng 
operations : 

>  Construction  (or  Reconstruction)  —  the  entire 
Dictionary  File,  or  a  set  consisting  of  several 
of  its  component /MOF ‘ s ,  is  constructed  or  re¬ 
constructed  to  correct  gross  errors.  In  addition, 
entire  new  MOF's  are  incorporated  in^'i  the 
Dictionary  File  by  this  method. 

>  Updating  —  nev  LOF's  are  added  to  MOF's  already 
existing  in  the  Dictionary  File. 

>  Correction  —  a  LOF  or  MOF  is  deleted  or  has  its 
verbal  description  changed. 

A  single  card  format  has  been  designed  to  process  all  of  these  types 
of  file  maintenance.  This  format  provides  uniformity  in  the  coding  of  all 
dictionary  cards  and  allows  for  the  recreation  of  the  entire  Dictionary 
File,  utilising  all  existing  cards.  The  same  format  is  also  used  to  generate 
the  MOF  description  entry.  The  general  format  of  these  cards  (in  which  each 
element  is  left  justified)  is  as  follows: 


CARD  COLUMNS 

CONTENTS 

USAGE 

1-5 

M0r  three  character  designation  enclosed 
by  the  usual  parentheses. 

All 

types* 

6-6 

MOF  code  -  one  digit  used  to  verify  key¬ 
punching  of  coded  entries  or  special 
code  to  indicate  that  the  MOF  is 
processed  in  an  exceptional  manner. 

MOF 

entries 

7-72 

Clear  text  spelling  of  the  MOF  or  LOF. 

(Note  that  this  spelling  Is  limited 
to  66  characters.) 

All 

types 

73-74 

Structure  indicator  (MOF  entry  contains 
total  number  of  levels  in  structure). 

All 

types 

(*WF  t  Ration  is  somewhat  redundant 
for  construction  of  LOF  entries,  but  is 
helpful  in  ordering  the  cards.) 

— —  continued  next  page 
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Jpdating  & 


MAPPING  OF  DIG- 


3E 


(LOF  entries  only)  Existing  dictionary 
LOF  reference  number  for  referenced  LOF. 

Desired  external  LOF  reference  number  Construction 

if  number  is  to  be  preprinted  in  the 
data  extraction  form. 

7. 1.4.1  MOF  Construction  (or  Reconstruction)  In  order  to  create  a  MOF 
in  the  Dictionary  File  or  to  update  a  MOF  to  include  new  generic  levels,  the 
entire  MOF  must  be  constructed  (or  reconstructed)  as  a  unit. 

For  simplicity  in  coding,  the  LOF 's  are  sequenced  by  their  generic 
level  and  contain  a  structure  indicator  which  designates  this  level,  or  their 
usage  as  a  synonym,  or  a  variant  spelling. 

These  indicators  are  as  follows: 

1,2,3,...  level 
$  variant  spelling 

=  synonym 

Each  LOF  io  assigned  only  one  indicator  for  brevity  and  ease  in  re¬ 
sequencing  so  that  variant  spellings  and  synonyms  are  considered  to  be  at 
the  previously  indicated  level.  Since  variant  spellings  pertain  ‘j  a  par¬ 
ticular  word  whereas  synonyms  apply  to  a  possible  group  of  w<  as,  variant 
spellings  must  follow  immediately  their  object  word.  Moreover,  since  the 
connotation  of  words  cannot  be  considered,  the  sequence  cr  a  group  of  syno¬ 
nyms,  including  the  determination  of  the  b~  e  word  (assigned  the  lev  indi¬ 
cator),  is  somewhat  optional. 

Lf  an  existing  MOF  is  restructured,  both  the  Data  and  Dictionary  File 
entries  which  pertain  to  that  MOF  must  be  recreated.  Hence,  when  the  level 
structure  of  a  MOF  may  contain  unknown  lower  levels,  it  is  desirable  to 
indicate  the  maximum  number  of  levels  possible  for  the  MOF  without  actually 
assigning  any  LOF's  to  those  levels.  The  Introduction  of  a  new  MOF  to  the 
Dictionary  File  does  not  require  the  recreation  of  the  Data  File  since  no 
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LOF's  can  exist  for  that  *f)F.  LOF's  of  such  newly-int. oduced  MOF's  would 
have  to  be  entered  into  the  Data  File  by  means  of  Data  File  maintenance. 

Consider  the  MOF,  (PGS)e  ^r^viously  used  as  an  illustrative  example, 
with  the  following  structure: 


Anthropoidea 


Catarrhini  Catarhini  ] 
(Oivi  World  Anthropoid *) 


Pongidac  Pongiidae'  Co  rcopithocidao 
(Simiidae)  '  Simidacl 
(Groat  Apos) 


Piatvr  rhini 


Lemuroicoa  Tassioidea 


Hapahdae  Cebidac 


The  MOF  (PGS)  could  be  properly  constructed  with  the  following 
dictionary  cards: 


TEXTUAL  SPEJ  '  TNG 

7 

72 

PRIMATE  GROUPS  INVOLVED 

IN  STUDY 

!  ANTHROPOIDEA 

CATARRHINI 

CATARH1NI 

OLD  WORLD  ANThROPOIDS 

PONGIDAE 

PONGIIDAE 

SIMI1DAE 

SI Ml DAE 

GREAT  APES 

CERCOPI THEC I DAE 

PLATYRRHINI 

HAPALIDAE 

CEB I DAE 

PR0SIM1 1 

LEMUROIDEA 

TASS IOIDEA 

STRUCTURF 

IND 

73 

74 

3 

l 

2 

$ 

The  previous  example  (first  listing)  illustrates  the  Dictionary 
File  records  resulting  from  input  of  the  above  cards. 
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7. 1.4.2  Dictionary  Updating  Since  the  Dictionary  File  is,  by  definition, 
open-ended,  new  LOF’s  may  be  added  at  any  time.  These  LOF's  may  be  in¬ 
corporated  into  the  Dictionary  File  either  when  the  data  points  are  encoded 
originally  or  when  the  system  indicates  that  a  previously  undefined  LOF  has 
been  encountered.  In  the  latter  case  a  pre-punched  card  containing  the  MOF 
designation  and  a  clear  text  spelling  of  the  undefined  LOF  will  automatically 
be  provided  by  the  system.  This  will  insure  that  the  correct  LOF  in  defined 
and  thac  oil  such  LOF's  are  considered.  The  LGF  reference  numbers  will  be 
provided  in  the  dictionary’  printouts. 

A  previously  undefined  LOF  can  fall  into  any  one  of  the  following 
categories : 

(1)  Variant  spelling  for  an  existing  LOF 

(2)  Synonym  for  an  existing  LOF 

(3)  New  L0p 

If  the  LOF  is  in  category  1  or  2,  it  is  incorporated  into  the  Diction- 
arv  File  merely  by  equating  it  to  the  appropriate  dictionary  entry.  Card 
columns  75-80  are  used  to  indicate  the  LOF  reference  number  of  the  corres¬ 
ponding  dictionary  entry,  and  the  structure  indicator  will  contain  the 
variant  spelling  ($)  or  synonym  (*)  symbol.  If  the  LGF  is  a  new  word  it 
oust  be  related  to  an  exist '  lg  LOF  unless  the  MOF  has  only  one  level.  The 
lelatlonship  is  determined  by  describing  where  in  the  tree  structure  the 
new  LOF  belongs,  and  Is  indicated  by  assigning  a  level  In  the  structure  in¬ 
dicator  and  by  providing  Che  LOF  reference  number  of  the  (base)  entry  under 
which  the  new  LOF  should  appear.  Other  new  LOF’s  on  the  same  tree  branch 
are  coded  with  a  ’ tiuccure  indicator,  but  without  any  LOi  numbers.  Thus 
the  method  for  updating  is  identical  to  that  for  construct  Ion ,  with  the 
exception  that  explicit  LOF  numbers  must  be  included  for  certain  entries. 
(These  updating  cards  could  be  combined  with  the  initial  MOF  construction 
cards  to  reconstruct  an)  M)F's.) 
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Consider  Che  MOF's,  "Carnivore  groups  involved  in  study  (CGS)"  and 
"Rodent  groups  found  during  survey  (RGS)",  with  the  following  structures, 
in  which  the  underlined  LOF's  are  to  be  added  to  the  existing  MOF's: 


(CGS) 


Canidac  [  Caniidae  j 


Felidae 

(Cats) 


Ureiidae  [ Ursidae] 
(Bears) 


(RGS)  Rodentia 
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The  f ollcwir g  dictionary  cards  would  be  required  to  include  the  pre 
ceding  new  LOF's  (underlined)  into  the  MOF's,  (CGS),  and  (RGS),  where 
N(k)  equals  the  LOF  reference  number  of  the  existing  LOF  entry  k  . 


MOF  DESC 

CODE 

TEXTUAL  SPELLING 

ST  INE 

1  5 

6 

7  72 

73 

75  80 

(CGS) 

CAN II DAE 

$ 

N(Canidae)  "j 

(CGS) 

CATS 

- 

N(Felidae)  J 

(CGS) 

URSIIDAE 

1 

(CGS) 

URSIDAE 

$ 

(CGS) 

BEARS 

- 

(RGS) 

RODENTS 

- 

N(Rodentia)  1 

(RGS) 

MURIDEE 

$ 

N(Muridee) 

(RGS) 

HAMSTERS 

3 

N (Cricetidae) 

(RGS) 

SCIURIDAE 

2 

N(Rodentia) 

(RGS) 

SCIURIIDAE 

$ 

(RGS) 

SCIURADEE 

$ 

(RGS) 

5CIURIDS 

=* 

(RGS) 

SQUIRRELS 

= 

(RGS) 

CHIPMUNK 

3 

(RGS) 

SCIURUSS 

3 

(RGS) 

SQUIRREL 

- 

J 

Comments  - 


order  is 
immaterial 

assignment 
of  synonym 
permissive 

order  is 
immaterial 

an  order 
is 

essential 


The  MOF,  (PCS),  could  be  properly  constructed  with  the  following 
dictionary  cards: 


MOF  DESC 

1  5 

CODE 

6 

TEXTUAL  SPELLING 

7  72 

STRUCTURE  IND 
73  J  74 

(PGS) 

2 

PRIMATE  GROUPS  INVOLVED  IN  STUDV 

3 

(PCS) 

ANTHROPO I DEA 

1 

(PGS) 

CATARRHINT 

2 

(PGS) 

CATARHINI 

$ 

(PCS) 

OLD  WORLD  ANTHROPOIDS 

as 

(PGS) 

PONGIDAE 

3 

(PGS) 

PONGIIDAE 

$ 

(PGS) 

SI  MI  IDAE 

m 

(PGS) 

SIMIDAE 

$ 

(PGS) 

GREAT  APES 

= 

(PGS) 

CERCOPITHECIDAE 

3 

(PGS) 

PLATYRRHINI 

2 

(PGS) 

HAP AL IDAE 

3 

(PGS) 

CEBIDAE  j 

3 

(PGS) 

PROSIMII  f 

1 

(PGS) 

LEMUROIDEA 

2 

(PGS) 

TASSIOIDEA 

2 

The  rrevious  example  (first  listing)  illustrates  the  Dictionary 
t !  1>  records  resulting  from  input  of  the  above  cards. 
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7. 1.4.3  Dictionary  Correction  Due  to  the  generic  nature  of  many  of  the 
MOF's,  structural  changes  in  the  Dictionary  File  would  be  difficult  and 
cumbersome  to  accomplish  —  and  to  describe  —  in  terms  of  updating.  More¬ 
over,  such  changes  to  the  Dictionary  File  would  make  the  Data  File  obsolete. 
For  these  reasons  structural  changes  should  be  achieved  by  MOF  reconstruction, 
limiting  dictionary  correction  to  such  changes  as  would  not  affect  entries  in 
the  Data  File. 


Correction  of  the  dictionary  is,  therefore,  restricted  to  the  following 
functions : 


(1)  Change  in  textual  description  of  a  LOF  or 
MOF  —  indicated  by  a  "/"  in  card  column 
73  (structural  indicator). 

(2)  Delete  any  MOF  or  LOF  entirely  —  indicated 
by  a  "D"  in  card  column  73. 

Correction  of  a  variant  spelling  requires  special  consideration  since 
such  LOF's  do  not  possess  a  unique  I  OF  number  by  which  they  can  be  identi¬ 
fied.  Because  of  this  the  elimination  of  a  variant  spelling  can  never  alter 
the  structure  of  a  MOF.  Variant  spellings  can  be :  physically  removed  from 
the  Dictionary  File  if  the  structural  indicator  "D"  is  utilized  with  the 
textua.'  description  of  the  variant  spelling.  This  is  the  only  type  of 
Dictionary  File  maintenance  in  which  this  description  field,  card  columns 
7-72,  contains  the  LOF  to  be  operated  on.  (For  ease  of  processing  the  LOF 
number  should  also  be  indicated.)  If  a  variant  spelling  is  to  be  corrected 
it  mist  be  deleted  and  the  correct  variant  spelling  entered  as  an  update. 


Changes  in  the  Dictionary  File  for  LOF’s  (or  MOF’s)  which  are  not 
variant  spellings  are  accomplished  by  using  the  structural  indicator  "/" 
and  their  existing  LOF  number  (or  MOF  designation).  The  deletion  of  such 
a  LOF  would  normally  be  accomplished  by  use  of  ”D"  and  its  former  LOF  num¬ 
ber;  changing  the  textual  description  of  a  LOF  to  a  blank  field  would  also 
delete  that  LOF.  In  either  event,  the  LOF  text  would  be  considered  u  defined 
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for  input.  In  the  latter  case,  however,  the  LOF  reference  number  would  not 
be  undefined  when  used  in  the  input  data. 

When  a  LOF  is  deleted, synonyms  and  lower  level  tree-structured  LOF's 
remain  in  the  Dictionary  File  unless  explicitly  deleted.  Moreover,  any 
future  maintenance  of  the  remaining  LOF's  must  reference  the  original  level 
of  the  LOF. 

To  illustrate  the  foregoing,  consider  the  existing  MOF,  "Pelecypod 
groups  found  in  water  rase,  airs  (HWR)",  with  the  following  structure: 


This  MOF  could  be  transformed  into  the  MOF,  "lams  found  in  drinking  - 
water  reservoirs  (HWR),  with  the  following  structure: 


Quahog  and  soft-shell  clams  are  still  physically  at  level  3  in  the 
Diet  <mary  File;  however,  logically  they  could  ’  '  considered  as 
being  at  ievel  l. 
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MOF  DESC 

1  5 

CODE 

6 

TEXTUAL  SPELLING 

7  72 

ST  IND 
73 

75  80 

(HWR) 

■j 

CLAMS  FOUND  IN  DRINKING- 

WATER  RESERVOIRS 

(HWR) 

PECTINACEA 

/ 

/ 

N(Pec:xncidea) 

(HWR) 

LIMIDS 

/ 

N(Lima) 

(HWR) 

PECTEEN 

D 

N (Pectea) 

(HWR) 

PECTIN 

$ 

N (Peccen) 

(HWR) 

/ 

N(Non-pectinid  p.) 

(HWR) 

/ 

N (normal-t . clams) 

i 

'v  ✓ 

/ 

N(razor  clams) 

7.  Data  Processing 


7.1.5  STORAGE  PROCESSING 

The  objective  of  the  MOD  Storage  Subsystem  is  to  build  and  maintain 
a  collection  of  valid  data  point  records  from  the  input  data.  This  requires 
that  the  validity  of  incoming  records  be  checked  bv  the  Dictionary  File, 
hence  the  Dictionary  File  must  also  be  built  and  maintained.  The  MOD  Stor¬ 
age  Subsystem  has  been  designed  sc  that  the  processing  of  the  dictionary 
and  the  data  can  be  accomplished  either  simultaneously  or  individually  (see 
Fig.  7.3). 

When  both  operations  are  to  _ake  place  the  dictionary  processing  is 
accomplished  first.  In  an  initial  run  it  would  be  advantageous  to  pre¬ 
define  a  subset  of  dictionary  terms  to  reduce  the  number  of  undefined 
LOF's.  In  subsequent  runs, maintenanc  to  the  dictionary  might  well  include 
both  previously  undefined  LOF's  and  newly  defined  LOF's  for  the  current 
input  data. 

7. 1.5,1  Dictionary  Building  and  Maintenance  All  types  of  file  mainten¬ 
ance  input  cards  are  processed  to  build  or  revise  trie  Dictionary  File.  The 
original  input  sequence  of  these  cards  must  be  maintained  since  the  order 
indicates  the  structural  relationships  of  the  LOF's  within  a  MOF.  Further¬ 
more,  this  original  sequence  cannot  be  recreated  by  machine. 

In  general  the  type  of  maintenance  to  be  performed  is  designated  in 
columns  73~,‘<4  of  the  input  cards,  however,  MOF  (.construction  (or  construc¬ 
tion)  can  only  be  differentiated  from  regular  updating  by  the  presence  of 
a  level  entry  in  the  MOF  maintenance  card.  MOF  maintenance  cards  differ 
from  LOF  cards  in  that  MOF  cards  contain  a  code  in  card  column  6. 

The  sequence  of  processing  in  Construction,  Reconstruction,  and  Up¬ 
dating  of  the  Dictionary  File  is  as  follows: 
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(1)  (TAPE  ONLY)  Guild  a  magnetic  tape  record  from 
^.nput  card  that  contains  a  generated  serial 
number  which  is  an  ordered  combination  of 

the  MOF  order  on  the  Dictionary  File,  any 
indicated  LOF  reference  number,  and  the 
original  input  sequence  number. 

(2)  (TAPE  ONLY)  Sort  these  generated  records  by 
their  serial  number.  The  textual  spelling 
may  be  added  as  the  minor  field  of  the  sort 
to  facilitate  processing  corrections  to 
variant  spellings. 

(3)  Process  input  against  existing  numerical 
records  in  the  Dictionary  File. 

(4)  Build  MOF  record. 

(5)  Assign  next  sequential  reference  number  to 
new  LOF. 

(6)  Construct  LOF  code  number. 

(7)  Build  numeric  order  LOF  record. 

(8)  (TAPE.  ONLY)  Sort  numeric  order  records  into 
alphabetic  order. 

(9)  (DISK  ONLY)  Determine  index  number  of  alphabetic, 
order  for  numeric  order  and  create  alphabetic 
record. 

(10)  Print  new  dictionary  entries  alphabetically. 

(11)  List  file  maint.  ^ance  errors,  if  any. 

(12)  At  user's  option,  print  the  entire  Dictionary  File. 


The  dictionary  listing.  iave  the  following  format: 


Errors  are  listed  separately  after  the  dictionary  listing  and  have  the  same 
general  format  as  above  plus  an  explanation  of  the  MOF  or  LOF  error,  printed 
on  the  right  side  of  the  page. 

This  basic  dictionary  processing  has  the  following  variations,  accord¬ 
ing  to  the  type  of  maintenance  being  performed: 


7.  Data  Processing 


Construction  ur  Re construct  loo  —  The  previous  numerical  records  for 
the  MOF  are  disregarded.  Any  LOF's  with  designated  reference  numbers  are 
processed  first;  the  remaining  LOF's  are  assigned  reference  numbers.  MOF 
reconstruction  requires  that  the  Data  File  be  regenerated  to  insure  that  the 
proper  LQF  code  numbers  are  contained  therein. 

Update  —  The  previous,  numerical  records  for  the  MOF  are  retained  and 
new  records  are  generated  as  required. 

Correction  —  The  previous  numeric^’  records  are  retained  for  all  the 
LOF’s  in  the  MOF  except  for  those  which  are  deleted. 

Since  the  older  of  the  cards  indicates  the  structure  of  the  MOF,  it 
must  be  assumed  that  the  input  order  is  correct;  if  these  cards  are  not  in 
the  proper  sequence  the  MOF  will  have  to  be  reconstructed.  Deletions,  cor¬ 
rections,  references  to  non-existent  LOF's,  and  references  to  unidentifiable 
MQF's  will  be  flagged  as  errors. 

7. 1.5.2  Data  File  Processing  After  an  initial  Dictionary  File  is  built, 
input  processing  then  creates  the  Data  File  from  the  data  input  cards.  This 
processing  not  only  creates  new  data  point  records  but  also  corrects  and  up¬ 
dates  existing  data  point  records  in  the  Data  File.  An  existing  Dictionary 
File  is  necessary  in  order  to  process  the  input  MOF's  and  LOF's  properly. 
Input  entries  are  matched  against  the  dictionary  file  to  insure  the  validity 
of  all  MOF's  and  LOF's,  also  to  convert  qualitative  LOF's  to  their  numeric 
code  number  for  internal  storage  in  the  Data  File.  Undefined  LOF's,  un¬ 
allowable  or  invalid  LOF's  and  MOF's,  and  any  other  detectable  errors  are 
listed  during  this  procedure.  Incomplete  data  point  records  are  maintained 
as  a  separate  incomplete  Data  File  until  seme  corrective  action  is  taken. 

The  input  processing  functions  are  performed  by  the  folic jing  programs: 

(1)  FORMAT  DATA  —  This  program  transforms  the  "ree-fonn  input  data 
into  fixed-format  magnetic  tape  records  in  which  °’ch  LOF  is  an  individual 
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record.  Each  record  contains  the  Data  Point  Number,  the  MOF  designation, 
and  one  LOF .  The  LOF  may  be  in  the  form  of  a  LOF  reference  number,  a  textual 
spelling,  or  a  numeric  value.  The  FORMAT  DATA  program  also  assigns  a  code 
number  to  the  MOF's  that  corresponds  to  the  MOF  order  within  the  dictionary 
tile.  In  addition,  the  various  types  of  Data  File  maintenance  cards  are 
assigned  a  code  in  accordance  with  the  order  of  file  maintenance  precedence. 
This  maintenance  code  is  used  as  a  minor  sort  .field  after  the  data  has  been 
translated.  Under  the  rules  established  for  keypunching  the  input  data,  each 
input  card  is  an  entity,  hence  it  can  be  processed  in  any  desired  order.  It 
is  not  required  that  narrative  (NAR)  records  be  properly  sequenced  at  this 
time. 


(2)  SORT  FORMATTED  DATA  —  The  output  tape  from  the  FORMAT  DATA  program 
is  then  sorted  by  the  assigned  MOF  code  number  and  by  the  LOF  reference  number 
(less  MOF  code  digit  and  the  checksum  digit).  At  the  same  time, the  Incomplete 
Data  File  is  incorporated  into  the  sort  as  a  second  reel  of  input.  Both  the 
new  Foriutted  Data  File  and  the  Incomplete  Data  File  will  have  the  same  format 
and  may  be  considered  as  one  entity  during  the  succeeding  processing.  The 
purpose  of  this  program  is  to  speed  the  matching  of  MOF's  and  LOF's  with  the 
Dictionary  File,  however,  if  Che  Dictionary  File  is  on  magnetic  tape,  this 
operation  becomes  essential  rather  than  merely  a  means  of  increased  efficien¬ 
cy.  This  sort  operation  will  sequence  all  the  LOF's  into  alphabetic  and 
numeric  order  within  each  MOF  (but  the  If"-'  sequence  has  functional  signifi¬ 
cance  only  for  qualitative  LOF's). 

(3)  TRANSLATE  DATA  -  This  program  is  the  bridge  between  the  input 
data  and  the  Data  File.  Here,  LOF  records  from  the  preceeding  sort  program 
will  be  compared  with  appLopriate  encries  in  the  Dictionary  File.  Each 
qualitative  LUF  will  be  tested  for  definition.  If  defined,  the  LOF  code 
number  will  be  added  to  the  LOF  data  record.  LOF  reference  numbers  will  be 
matched  against  the  numerical  order  section  of  the  dictionary,  in  addition, 
the  validity  of  their  MOF  code  digit  and  checksum  digit  will  also  be  deter¬ 
mined  (see  Fig.  7.2).  LOF  textual  spellings  will  be  compared  with  the 
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alphabetic  order  section  of  the  dictionary.  If  such  a  textual  entry  does 
not  exist  in  the  Dictionary  File,  the  entire  entry  is  listed  and  a  card  is 
punched.  This  card  will  contain  the  MOF  designation  and  the  textual  spell¬ 
ing  of  the  undefined  LGF  —  in  the  Dictionary  File  maintenance  format  —  so 
that  it  can  be  entered  later  into  the  Dictionary  File  maintenance  programs 
without  a  need  to  create  an  entire  entry  and  without  danger  of  mis-punching 
the  textual  spelling.  The  punching  of  LOF  cards  will  be  summarized  at  the 
LOF  level,  e.g.,  even  though  several  data  points  have  the  same  undefined  LOF 
for  a  given  MOF,  only  one  LOF  card  will  be  produced.  Each  quantitative  LOF 
will  be  tested  for  validitv  as  indicated  by  the  MOF.  Alternatively,  this 
MOF  validity  indication  may  be  added  to  the  LOF  record  and  tested  in  the 
edit  program.  The  LOF  record  for  any  LOF  which  is  undefined,  or  invalid, 
or  whicn  refers  to  an  unidentifiable  MOF  will  be  flagged.  But  only  the  un¬ 
defined  qualitative  LOF's  will  be  listed  at  this  time  (allowing  all  undefined 
words  to  be  analyzed  with  respect  to  the  structure  of  their  MOF).  In  con¬ 
sequence  .this  listing  will  be  uncluttered  and  will  correspond  to  the  punched 
cards.  All  the  errors  will  ce  listed  later  by  data  point  number  in  the  data 
editing  program  so  that  the  errors  can  also  be  considered  in  terms  of  the 
entire  data  point. 

(4)  SORT  TRANSLATED  DATA  —  The  translated  data,  including  all  error 
records,  will  be  sorted  by  Data  Point  Number,  MOF  designation,  LOF,  and  file 
niaintenance  type.  Tuis  will  provide  for  the  immediate  updating  and  construc¬ 
tion  of  the  Data  File  from  all  of  the  appropriate  LOF  records.  Th*.  narra¬ 
tive  (.NAR)  records  and  those  for  any  other  non-retrievab le  "MOF"  will  be 
sequenced  as  the  last  records  for  each  data  point,  and  by  continuation  number, 
if  applicable. 

(5)  UPDATE  &  EDIT  DATA  FILE  --  The  data  point  records  of  the  Previous 
Data  File  will  be  updated  by  the  output  of  the  SORT  TRANSLATED  DATA  program. 
For  processing  facility,  sufficient  main  memory  should  be  available  to  con¬ 
tain  one  data  point  record  from  the  old  Data  File,  all  of  the  new  LOF  input 
records  for  a  data  point,  and  a  buffer  area  for  3  new  data  point  record. 

None  of  these  data  point  records  would  have  to  include  any  narrative  at  this 
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time.  One  should  have  access  to  the  entire  set  of  new  LOF's  for  a  d..  a  point 
so  that  all  of  Its  component  items  can  be  written  on  the  Incomplete  Data  File 
if  necessary.  If  main  memory  availability  is  severely  restricted,  these  In¬ 
complete  data  records  could  be  purged  by  additional  pro;  ssing. 

The  updating  and  editing  for  each  data  point  wil1  consider  one  entire 
MOF  at  a  time.  Any  necessary  file  maintenance  operations  will  first  be  per¬ 
formed  in  accordance  with  the  order  of  maintenance  precedence.  File  main¬ 
tenance  errors  such  as  deletion,  addition,  or  replacement  of  non-exister.r. 
entries  will  be  listed.  Then  required  edit  checking  will  be  done.  In  some- 
instances,  consistency  among  different  MOF's  may  be  tested.  Several  specific 
processing  steps,  necessitated  by  the  characteristics  of  the  MOD  data,  will 
also  be  carried  out  during  the  input  processing  for  the  Data  File.  For 
example,  the  data-reliabiiity  MOF  "Computer  evaluation  of  data  point”  will 
be  calculated  according  to  a  suitable  algorithm,  and  the  resulting  number 
stored  as  a  numeric  LOF  for  that  MOF.  Also,data  points  whose  Specific  Disease 
Agent  is  specified  a8  a  logical  sum  of  positive  and  negative  items  will  be 
split  for  storage  and  later  processing  into  one  point  for  all  the  positives 
(with  a  non-zero  value)  and  one  zero-valued  data  point  for  all  the  negative 
items.  Finally,  the  entire  newly  formed  data  point  will  be  searched  to  in¬ 
sure  that  all  essential  MOF’s  are  present.  Then  record  index  of  updated  or 
new  data  point  records  will  be  appropriately  revised. 

An  Incomplete  Data  File  will  be  generated  that  will  contain  all  f  the 
LOF  records  for  those  data  points  which  lack  essential  MOF's  or  have  un¬ 
defined  qualitative  LOF's.  Suet;  data  points  will  not  be  included  in  the  Dp- 
dated  Data  File.  Other  types  of  LOF  errors  will  merely  cause  that  LOF  to  be 
eliminated  from  the  appropriate  file.  If  this  elimination  causes  the  loos 
of  an  essential  MOF,  the  data  point  will  be  transferred  to  the  Incomplete 
Data  File. 

Both  the  Updated  Data  File  and  the  Incomplete  Data  File  are  in  data 
point  sequence.  The  purpose  of  the  incomplete  file  is  to  simplify  correction 
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requirements,  also  to  facilitate  the  listing  of  these  deficiencies  until 
they  are  remedied.  Hence,  only  new  or  updated  d3ta  point  recorus  will  be 
tested;  unaltered  records  will  simply  be  merged  Into  the  new  (Updated)  Data 
File.  Existing  data  point  records  will  not  be  eliminated  from  the  Data  File 
if  erroneous  corrections  are  made.  Rather,  these  invalid  corrections  will 
be  listed  ar.d  then  ignored  in  the  processing.  If  one  wished  to  purge  these 
records  onto  the  Incomplete  Data  File,  the  existing  data  point  records  would 
have  to  be  decomposed  into  component  LOF  records. 

(6)  DECOMPOSE  DATA  FILE  —  This  operation  is  required  if  a  MOF  in  the 
dictionary  has  been  reconstructed.  In  this  event  the  LOF  code  numbers  in 
the  data  will  usually  be  inaccurate  and  will  have  to  be  regenerated.  This 
can  be  accomplished  if  every  data  point  record  in  the  Data  File  is  decomposed 
into  a  group  of  separate  LOF  records.  The  LOF  reference  number  can  be  de¬ 
termined  as  being  the  Lowest  level  entry  in  the  LOF  code  number  contained  in 
the  Data  File,  (.The  regeneration  of  the  Data  File  is  possible  because  MOF 
reconstruct  ion  does  not  alter  the  reference  numbers.)  The  entire  Previous 
Data  File  can  then  be  re-entered  into  the  system  in  the  form  of  LOF  records, 
as  additional  input  to  the  SORT  FORMATTED  DATA  program.  These  separate  LOF 
records  will  contain  the  MOF  designation  and  the  particular  LOF  item.  For 
qualitative  LOF's,  this  item  will  be  the  LOF  reference  number. 


1  RETRIEVAL  SUBSYSTEM 


The  function  of  the  MOD  Storage  Subsystem  is  to  create  a  data  base  from 
which  the  desired  MOD  output  results  can  be  produced.  The  MOD  user  will 
obtain  this  output  by  means  of  a  query  to  the  MOD  system.  His  query  must 
describe  ..he  following  three  aspects  of  the  desired  output : 

i  1 )  Retrieval  c ond i  1 1  ons  --  any  character  i  st  i 
that  the  data  aist  contain  m  order  to  be 
considered  fur  output  (.such  as  specific 
disease  agent,  or  species  infected,  or  time 
period)  . 

continued  next  page 
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(2)  Synthetic  or  manipulative  oner |t ions  — 
any  operations  which  must  be  performed 
upon  the  retrieved  data  prior  to  output 
(such  as  combining  points  by  averaging 
their  values) . 

(3)  Output  specif lcations  —  the  type,  format, 

and  content  of  the  desired  output  (such  as 
map  projection  to  be  used).  ! 

These  are  accomplished  in  sequence  given  since  a  subset  of  data 
points  must’  first  be  considered,  then  operated  upon,  and,  finally,  displayed. 

The  Retrieval  Subsystem  (shown  in  Fig.  7.4)  will  now  be  discussed  in 
detail  as  it  is  more  related,  logically  and  physically,  to  the  Storage  Sub¬ 
system  than  to  the  other  two  subsystems.  Moreover,  retrieval’  is  the  first 
(and  most  fully  developed)  aspect  of  the  entire  query  procedure.  Manipula¬ 
tive  operations  and  output  specifications  are  independent  of  retrieval,  and 
both  of  these  will  be  discussed  later. 

7.2.1  RETRIEVAL  LANGUAGE 

! 

In  any  retrieval  system,  items  are  selected  for  retrieval  which  satis¬ 
fy  the  given  (query)  conditions.  The  manner  in  which  these  conditions  are 
expressed  is  of  the  utmost  importance  for  effective  retrieval.  At  the 
present  stage  of  development  of  the  MOD  system  there  has;  been  insufficient 
experience  in  the  areas  of  retrieval  usage  to  determine  optimal  specifica¬ 
tions  appropriate  to  the  requirements  (and  background)  of  the  potential 
bio-medical  users.  For  this  reason  an  interim  retrieval  language  has  been 
established.  Based  upon  experience  gained  in  actual  use  of  the  MOD  system, 
the  interim  retrieval  language  can  be  modified  to  yield  a  more  elaborate  — 
and  efficient  —  "ultimate”  retrieval  language,  j  But  with  this  present  MOD 
system  design,  the  specific  retrieval  request  would  be  formulated  by  the 
data  analyst  from  a  more  generalized  query  made  [by  the  bio-medically  oriented 
user.  (Of  course  the  user  himself  could  formulate  the  retrieval  request  if 

he  were  confident  that  he  understood  fully  all  the  logical  facets  of  his 

I  | 
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query.  When  there  has  been  sufficient  experience  with  the  implemented  MOD 
system,  the  ultimate  MOD  retrieval  language,  can  also  be  formulated. 

From  a  system  viewpoint , those  pw*.s.lons  of  the  HDD  query  which  relate 
to  retrieval  can  be  expressed  in  terms  of  the  interim  language  by  a  pre¬ 
liminary  processing  program,  then  be  operated  upon  by  the  Retrieval  Sub¬ 
system  as  specified  in  this  section.  To  provide  for  this  transition  and  to 
allow  for  all  logical  requests,  the  interim  language  consists  of  all  the 
basic  logic  functions  and  operations  of  a  general  retriefal  system,  expressed 
in  most  direct  and  concise  manner. 


Let  us  consider  first  the  rules  of  logic  which  pertain  to  the  use  of 
operators  to  connect  conditions.  In  the  following,  A,  B,  C,  and  D  each 
represents  any  retrieval  condition. 


The  logical  operators, AND  or  OR, can  combine  any  pair  of  conditions, 

and  the  result  is  itself  a  condition.  In  the  >K3D  interim  retrieval  language 

"+"  and  "/"  will  be  used  to  indicate  AND  and  OR  respectively.  The  meaning 

of  these  operators  are:  .  /rtn.  c  „  .  , 

/  (.OR)  The  condition  A  /  B  *  3  satisfied  it  A  is 

true,  or  if  B  is  true,  hence,  also,  if 

A  and  B  are  both  true. 

+  (AND)  The  condition  A  +  B  is  satisfied  if,  and 
only  if,  both  A  and  B  are  true. 


Since  the  result  of  a  logical  operation  upon  two  conditions  is  itself 
a  condition,  ano-her  condition  can  be  combined  with  it.  But  these  combina¬ 
tions  are  not  associative,  hence  parenthesis  must  be  used  to  indicate  the 
meaning  of  certain  combinations. 

If  the  logical  operators  are  the  same,  parenthetical  grouping  is  un¬ 
necessary. 

Example:  A  /  &  /  C  -  (A  /  B)  /  C  -  A  /  (B  /  C) 

A  +  B  +  C  **  (A  +  B)  +  C  -  A  +  (B  +  C) 


7  -  UO 


7 .  Data  Processing 


If  the  logical  operators  are  mixed,  parenthetical  grouping  is  essential 
for  proper  meaning. 

Example : 

A  /  (B  +  C)  *  (A/B)  +  (A/C)  4  (A  +  C)  /  (2  +  C)  =  (A/B)  +  C 
A  +  (B/C)  -  (A  +  B)  /  (A  +  C)  4  (A/C)  +  (B/C)  »  (A  +  B>  /  C 

Since  parenthetical  grouping  Is  unnecessary  for  similar  operations, 
more  than  one  condition  can  appear  within  a  parenthesis,  e.g. 

(A  /  B  /  C)  +  D  (A  +  B  +  C)  /  D 

Reapplying  these  rules,  an  infinite  number  of  levels  of  parenthetical 
grouping  can  be  established.  However,  any  expression  which  contains  higher 
levels  of  grouping  can  be  reduced  to  one  level  of  parenthetical  grouping  by 
appropriate  repetition. 

Example:  A  /  (B  +  (C  /  D))  -  A  /  (B  +  C)  /  (B+D) 

A  +  (B  /  (C  +  D))  «  A  +  (B  /  C)  +  (B  /  D) 

((A  +  B)  /  C)  +  D  *  (A  +  B  +  D)  /  (C  +  D) 

Thus  the  interim  retrieval  language  can  perform  any  desired  retrieval 
operatio:.,  if  the  fallowing  two  rules  are  followed: 

(1)  Parenthesis  are  only  used  where  necessary 
(between  unlike  logical  operators  but  not 
between  like  operators). 

(2)  Only  one  level  of  parenthesis  is  allowed 
(higher  levels  must  be  manually  reduced). 

Thus  far  we  have  considered  conditions  abstractly,  and  treated  each 
condition  as  an  entity.  These  conditions  do  actually  apply  to  MOD  data 
however,  and  consist  of  several  components.  These  components  establish  a 
criterion  which  will  either  be  true  or  false  for  every  data  point  record 
of  the  Data  File.  We  are  not  merely  searching  for  the  presence  of  an  item 
in  the  Data  File;  it  is  necessary  that  this  item  be  considered  within  the 
proper  context,  i.e.,  a  specific  LOF  within  a  particular  MOF.  For  added 
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flexibility  we  can  allow  the  relationship  of  the  LOF  to  the  MOF  to  be  other 
than  equality. 

The  three  components  of  a  retrieval  condition  are: 

(1)  MOF  designation 

(2)  LOF  description 

(3)  Relational  operator 

The  relational  operators  defined  for  the  Mon  retrieval  system,  are: 

=  Equality  (including  synonyms,  variant  spellings,  and 
l°ss  generic  tree  structured  components) . 

=  Identity  (including  variant  spellings,  but  not 
synonyms) . 

f  Inequality  (i.e.,  not  equal  to). 

<  Less  th (significant  only  for  numeric  values). 

>  Greater  than  (significant  only  for  numeric  values). 

The  usual  MOF  designations  are  used  without  their  parentheses  in  a 
retrieval  request  because  the  existence  of  the  relational  operators  easily 
distinguishes  mop's  from  LOF's.  Moreover,  since  one  level  of  parenthetical 
grouping  is  allowed  for  logical  grouping,  use  of  parenthesis  fo^  other  pur¬ 
poses  in  the  language  should  be  avoided. 

To  be  consistent  with  our  rule  that  only  one  level  of  parenthetical 
grouping  be  allowed  in  a  retrieval  request,  each  condition  is  to  contain 
one  and  only  one  MOF  and  one  LOF.  if  a  criterion  logically  includes  two 
possible  LOF's  for  a  MOF,  the  MOF  must  be  explicitly  stated  twice  with  the 
proper  logical  operators. 

The  LOF  description  of  quantitative  LOF's  will  consist  of  their  actual 
numeric  value.  Qualitative  LOF's  can  be  described  either  by  their  textual 
spelling  or  their  reference  number  (but  textua1  description  would  probably 
be  the  more  useful  method  of  specifying  a  qu  illative  LOF). 
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7.2.2  RETRIEVAL  REQUEST  CARDS  j 

Retrieval  requests  will  be  activated  in  the  W3D  system  by  means  of 
retrieval  request  input  cards.  Several  different  sets  of  criteria  may  be 
requested  at  the  same  time,  and  these  will  be.  distinguished  by  being  assigned 
a  different  question  number. 


The  retrieval  cards  consist  of  two  fields.  The  first  contains  identi¬ 
fication  data,  question  number  (and,  possibly,  continuation  card  number). 

The  remainder  of  the  card  contains  the  requests  in  free  form  with  non- 
essential  blanks  optional.  Of  course  these  requests  must  be  formulated  in 
accordance  with  all  of  the  rules  described  in  the  preceding  section.  Identi¬ 
fication  data  will  be  listed  on  all  output  retrieval  reports. 

The  maximum  number  of  conditions  per  question  and  the  maximum  number 
of  questions  that  can  be  processed  at  the  same  time  will  '  ive  to  be  determined 
prior  to  establishing  precise  rules  for  these  items. 


A  sample  set  of  request  cards  would  appear  as  follows: 


JDES  5/12/67  1 
JDHS  5/12/67  2 
JDHS  5/12/67  2 
JDHS  5/12/67  3 


(MFX  -  GOOD  /  QXF  »  HIGH)  4-  VAL  >  25.3 
(MFX  =  GOOD  /  MFX  =  FAIR)  +  (.VAL  >  20 
/  FVL  >  .05)  4  SDA  J  L.  POMONA 
PHD  t  WILD 


7.2,3  RETRIEVAL  PROCESSING 

The  retrieval  aibsystem  reads  the  request  cards,  checks  the  validitv 
of  their  form  and  content,  and  obtains  any  required  LOF  code  numbers.  This 
subsy.  2  then  tests  each  data  point  record  in  the  Data  File  on  a  mathh/ 
synonym  basis  and  writes  the  selected  records  onu  or  more  magnetic  tape 

files . 


First,  validity  of  the  format  of  the  retrieval  request  cards  is  tested; 
detected  errors  will  be  listed.  If  there  exists  an  extraneous  parenthesis 


7  - 


I 


MAPPING  OF  DISEASE 


in  the  request,  the  "corrected"  interpretation  will  be  listed  as  a  flag  and 
the  requests  processed  in  accordance  with  this  interpretation. 

After  the  validity  of  the  format  of  the  entire  retrieval  request  has 
been  established, a  condition  record  is  generated  for  each  condition  in  the 
request.  These  condition  records  contain  the  following  elements: 

(1)  Question  number. 

(2)  Condition  number. 

(3)  Designated  MOF. 

(4)  Specified  LOF. 

(5)  Required  relational  operator. 

(6)  Next  operation  if  condition  is  true. 

(7)  Next  operation  if  condition  is  false. 

The  question  number  is  obtained  directly  from  the  request  cards.  The 
condition  number  indicates  the  sequence  of  each  condition  within  a  question. 
The  logical  sequence  must  be  maintained  in  order  to  execute  the  retrieval 
processing  properly. 

After  these  operations  the  validity  of  the  requested  MOF's  and  LOF's 
is  determine'  For  this  purpose  the  conditions  nust  be  considered  first  in 
MOF,  then  in  LOF  order.  The  volume  of  these  requests  will  probably  be  such 
that  an  external  sort  of  the  conditions  will  be  unnecessary.  Th**  Dictionary 
File  is  used  to  determine  the  validity  of  the  MOF's  and  LOF's.  The  LOF 
element  of  the  condition  records  for  quantitative  LOF's  will  contain  the  re¬ 
quested  value.  For  qualita  Ive  LOF's,  thib  element  will  contain  those 
portions  of  the  LOF  code  number  which  are  appropriate  to  the  request.  Gener¬ 
ally,  this  consists  of  all  the  reference  number  components  of  the  code  num¬ 
ber  down  to  the  level  of  the  LOF  being  considered.  (The  level  can  be  deter¬ 
mined  from  the  Dictionary  File  by  the  presence  of  the  first  zero  reference 
code  or  the  last  reference  code  within  the  LOF  code  number.)  But  if  the 
desired  relationship  is  one  of  identity,  the  entire  LOF  code  number  is  placed 
in  the  LOF  element  of  the  condition  record.  The  relational  operator  of  the 
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condition  is  tested  to  insure  that  it  is  appropriate  for  the  type  of  LOF 
being  considered.  This  operator  is  then  placed  in  the  relation  element  of 
the  condition  record.  (The  internal  indication  for  identity  can  be  the  same 
as  that  for  equality  since  the  composition  of  the  code  number  will  distin¬ 
guish  between  these  relationships.) 


The  contents  of  the  "next  operation"  field  can  be  determined  from  the 
logical  operator  (which  immediately  follows  the  present  condition)  if,  as 
is  the  case,  there  is  only  one  level  of  parenthetical  grouping  and  if  the 
logical  sequence  of  the  conditions  is  preserved: 


NEXT  LOGICAL  OPERATOR 

NEXT  OPERATION  IF  PRESENT  CONDITION  IS: 

TRUE 

FALSE 

None 

Select 

Reject 

/  outside  of  parenthesis 

Select 

Test  next  condition 

/  inside  of  parenthesis 

Test  next  condition  out¬ 
side  of  parenthesis  or 
select  if  none  exists. 

Test  next  condition 

+  outside  of  parenthesis 

Test  next  condition 

Reject 

+  inside  of  parenthesis 

Test  next  condition 

Test  next  condition 
outside  of  paren¬ 
thesis  or  reject 
if  none  exists. 

This  selection  or  rejection  refers  to  the  entire  data  point  record 
being  tested.  The  non-existence  of  a  next  logical  operator  is  considered 
within  the  format  of  the  present  question  if  more  than  one  question  has 
been  requested. 
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The  condition  records  for  some  fundamental  types  of  requests  are  now 


provided. 

In  the  folic..! 

ng,  the  MOF, 

LOF ,  and 

relational  operator  compo- 

nents  of  a 

condition  have 

been  represented  by  a 

i  single  letter 

for  simplicity 

an..'  each  request  has  been 

assigned  a 

different 

question  number. 

QUESTION 

CONDITION 

NEXT  OPERATION: 

CONDITION 

NUMBER 

IF  TRUE  - 

IF  FALSE 

QUESTION 

CONDITION 

NUMBER 

RECORD 

A/B 

1 

1 

A 

Select 

to  2 

1 

JL 

2 

B 

Se iect 

Reject 

A+B 

2 

1 

A 

to  2 

Re  iect 

2 

2 

B 

Select 

Reject 

A+(B/C) 

3 

1 

A 

to  2 

Re^ct 

3 

2 

B 

Select 

to  3 

3 

3 

C 

Select 

Reject 

(A+B ) /C 

4 

A 

to  2 

to  3 

4 

2 

B 

Select 

to  3 

4 

3 

C 

Select 

Reject 

A/.(B+C)  /D 

5 

1 

A 

Select 

to  2 

5 

2 

B 

to  3 

to  4 

5 

3 

C 

Select 

to  4 

5 

4 

D 

Se  iect 

Reject 

(A+B)/ (C+D) 

6 

1 

A 

to  2 

to  3 

6 

? 

B 

Select 

to  3 

6 

3 

C 

to  4 

Reject 

6 

4 

D 

Select 

Reject 

Any  errors  detected  in  the  fori-Jt.  or  content  of  a  retrieval  question 
will  cause  that  question  not  to  be  processed.  After  all  questions  and  condi¬ 
tions  have  been  verified,  the  user  will  have  an  option  as  to  whether  or  not 
retrieval  questions  without  errors  should  be  processed  if  other  questions  in 
his  request  contain  errors. 

After  all  the  condition  records  have  been  generated  for  a  request,  each 
data  point  in  the  Data  File  Is  tested  against  this  set  of  condition  records 
by  comparing  the  MOF,  LOF,  and  relational  operator.  The  location  of  tie 
LOF's  within  each  data  point  record  is  indicated  in  tire  data  u'cord  index. 
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The  LOF  description  in  the  condition  records  will  determine  the  matches. 

Every  data  point  is  compared  with  the  retrieval  conditions  set  forth  by  each 
retrieval  question.  The  methods  employed  in  the  Data  and  Dictionary  Files 
have  been  established  so  that  synonyms,  variant  spellings,  and  tree-structure 

relationships  do  not  have  to  be  handled  by  long  retrieval  lists.  Moreover, 

i 

the  condition  records  enable  the  retrieval  processing  to  test  only  those 
LOF's  required  to  ascartain  whether  a  data  point  should  be  selected  or  re¬ 
jected. 

If  a  data  point  record  is  selected  it  is  written  onto  a  magnetic  tape 
file  and  listed  by  Data  Point  Number  as  having  been  retrieved.  Various 
questions  may  be  output  onto  different  tape  units,  alternatively,  the  question 
number  may  be  appended  to  the  data  point  records  selected  by  that  question. 

The  formats  of  the  Updated  Data  File  and  the  Retrieved  Data  File  are  identical, 
hence  either  file  may  be  used  for  subsequent  output  processing.  Thus  syn¬ 
thetic,  or  manipulative  operations,  or  output  specifications  may  also  work 
against  the  entire  Data  File. 

7.2.4  ALTERNATE  LOF  CODING  PROCEDURE 

A  unique  feature  of  t 1  preceding  Storage  and  Retrieval  Subsystems  is 
the  method  of  coding  the  qualitative  LOF’s.  The  code  number  of  each  qualita¬ 
tive  LOF  is  constructed  to  indicate  the  structural  relationship  for  retrieval 
purposes.  The  Data  File  contains  those  code  numbers  that  consist  of  a  series 
of  numbers  whose  total  lenc  is  one  greater  than  the  number  of  levels  within 
the  MOF .  The  Retrieval  Si  tem  scans  all  or  part  of  this  series  of  numbers 
to  determine  if  a  selection  .riterion  has  been  satisfied. 

I  !  I 

s  i  i 

A  LOF  code  number  that  consisted  of  only  one  number  would  suffice  for 
retrieval  purposes  if  that  number  were  properjly  formulated,  and  it  is  with 
this  consideration  that  we  present  the  following  alternative  procedure. 
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Each  LOF  would  have  an  original  reference  number.  The  LOF  would  then 
be  sorted  by  its  descending  structure  relationships  within  the  MQF.  This 
sequence  would  determine  the  present  code  number  for  the  LOF's.  A  range  of 
code  numbers  consisting  of  the  first  and  the  last  code  numbers  which  repre¬ 
sent  Its  structural  relationship  for  each  LOF  could  then  be  ascertained. 
This  method  is  possible  because  the  range  of  code  numbers  has  no  missing 
members  in  it  —  because  of  the  structural  sequence. 


This  single  number  code  system  would  substantially  shorten  the  lengths 
of  the  Data  and  Dictionary  Files  and  would  make  the  retrieval  process  more 
direct.  Each  LOF  entry  in  the  Data  File  would  consist  of  only  two  numbers, 
the  original  reference  number  and  the  present  code  number.  The  code  number 
in  the  dictionary  would  be  reduced  to  two  numbers  for  all  qualitative  MOF's 
first  and  last  range  value.  The  range  values  would  be  used  for  normal  re¬ 
trieval  and  the  present  code  numbers  for  "synonym  lock-out"  retrieval.  If  a 
LOF  number  were  in  the  range  of  the  requested  LOF,  the  criterion  would  be 
satisfied  unless  synonym  lock-out  were  desired,  in  which  case  only  an  exact 
match  would  suffice. 


Weighing  the  pros  and  cons  of  this  alternative  method,  the  advantages 
of  brevity  seem  to  be  more  than  offset  by  the  requirement  that  the  entire 
Data  File  (rather  than  Just  the  new  or  incomplete  data  points)  be  translated 
against  the  Dictionary  File  after  virtuall)  any  type  of  file  maintenance  is 
performed  on  the  Dictionary  File.  (This  alternative  method  would  not  affect 
the  remainder  of  the  MOD  system.)  ■ 


Consider  again  the  MOF,  "Primate  groups.  Involved  in  study  (PCS)", with 


the  tree  structure  shown,  underlined  LOF's 


to  be  added  to  the  MOF: 


ll'(.S) 


Anthr  -  >  pot  ilea 
(  Might*  r  Prut  1 1«  * ) 


i  Prniim i  i 

{ Lower  Prirrutfi ) 


CrfMrrhini  f  Cjitarhinll  •  Cat*  r  inp 
(Old  World  Anthropoid*) 


PlAtyrrhifei  *  L«»muroid«*a  Tatainidea  Tupalolde* 


Ponytfl.it*  Pongttdar 
(Smuida#*)  Smudge? 
(C‘»  rr4l  Apr  a) 


Cc  r«  opithet  idae  Hapalldae  Gehtdfcr 
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The  MOF ,  S.PGS) ,  would  aprear  In  the  Dictionary  File  as  fellows  — 
after  its  initial  construction  (without  the  underlined  entries).  The  disk 
format  is  used  for  brevity. 


ANTHROPOIDEA 
CATARHINI 
CATARRH IN I 
CEBIDAE 

CERCOPITHECLDAE 
GREAT  APES 
HAPAL1DAE 
LEMJROIDEA 

OLD  WORLD  ANTHROPOIDS 

PLATYRFHl.il 

PONGIDAE 

PONGIIDAE 

PROS IMI 

SIMIDAE 

SIMIIDAE 

TASS  10  IDEA 


CODE 

RANGE 

REF 

1 

1 

10 

1 

2 

2 

7 

L. 

2 

2 

7 

y 

2 

10 

10 

10 

10 

7 

7 

7 

7 

6 

4 

6 

6 

9 

9 

9 

9 

12 

12 

12 

12 

3 

2 

7 

J 

8 

8 

10 

Q 

4 

4 

6 

4 

4 

4 

6 

4 

11 

ii 

13 

n 

5 

4 

6 

5 

5 

4 

b 

5 

13 

13 

13 

13 

i 

i 

1 

10 

1 

2 

2 

7 

L 

3 

3 

7 

3 

4 

u 

0 

4 

5 

4 

b 

C 

6 

4 

b 

b 

7 

7 

/ 

/ 

8 

8 

10 

8 

9 

9 

9 

g 

10 

10 

10 

10 

11 

11 

13 

ii 

12 

12 

12 

12 

13 

i  ...  —  _ 

13 

13 

c3 

Note  th.’t  tiie  reference  number  and  :ode  number  are  identical  --  after 
me  initial  construction  of  a  MOF . 


After  the  new  (un  lerlined)  LO-'s  are  added,  (PCS)  weald  be  as 


follows : 


RANGE 


ANTHKOPuIDEA 
CATARH1NI 
CATARLN1 
CATARRH INI 
CEBIDAE 

CERCOPITHECIDAE 

GREAT  APES 

HAP  AL  ID  At 

HIGHER  PRIMATES 

LEMliROIDEA 

LOWER  PRIMATES 

OLE  WORLD  ANTHROPOIDS 

PLATYRRHINi 

PONG I DAE 

PONGIIDAE 

PROS  1. HI 

SIMLDAE 

SIMILDAE 

LASS lOlDEA 

I L  P  A 1 0  L.O  LA 


11  11 
8  8 
5  I 
10  10 
1  11 
14  14 

12  io 
1  8 
9  il 


12  16 


io  16 


b* 


10  10 

11  li 

12  It? 

14  14 


Note  tr.at  the  LOr  code  numbers  arc  net  now  the  same  as  the  L'F 
r ■.  for n c e  n :nb t-rs. 
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7.3  SYNTHESIS  SUBSYSTEM 


After  all  the  pertinent  data  point  records  have  been  selected  by  the 

Retrieval  Subsystem  the  Synthesis  Subsystem  performs  necessary  and  desirable 

\ 

refining  ope. ations  upon  these  points.  The  synthesized  data  is  then  used  to 
produce  maps  and  reports.  These  refinements  consist  of  both  necessary 
synthesizing  operations,  which  are  required  to  combine  the  data  properly, 
and  optional  manipulative  calculations,  which  are  specified  by  the  MOD  user 
in  liis  query  request.  Operation  of  this  subsystem  is  diagrammed  in  Fig.  7.5. 

(  I 

Since  geographic  considerations  are  of  the  utmost  importance  throughout 
the  MOD  system,  the  geographic  location  of  each  data  point  must  be  adequately 
represented  to  fulfill  all  functional  requirements.  These  requirements  in¬ 
clude: 


(1)  Validation  of  input  data  —  specified  in  terms 
of  political  units,  or  longitude  and  latitude, 
or  both. 

(2j  Consistent  internal  storage  of  the  location  in 
the  MOD  Data  ^ile. 

(3)  Convertibility  to  either  verbal  descriptions 
(for  output  reports)  or  to  X,  Y  coordinates 
(for  mapping). 

(4)  Proper  interpretation  in  query  requests. 

(5)  Combination  or  coordination  characteristics  by 
which  the  data  points  can  be  combined,  refined, 
and  enhanced  for  output  representation. 

7.3.1  DICTIONARY  FILE  (LOCATION  FUNCTIONS) 


In  the  MOD  system  the  Dictionary  File  is  required  to  accomplish  the 
following  two  functions  dealing  with  geographic  locations: 

(1)  Gazetteer  function  --  in  which  all  geographic 
names  must  be  described  in  terms  of  a  generic 
■  ree-structure  with  synonyms  and  variant  spellings 
(like  the  MOF's  previously  discussed). 

continued  next  page 
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(2)  Grid  function  —  in  which  a  sufficient  number 
of  geographic  points  are  identified  to  provide 
capping  coordinates;  these  coordinates  rust  also 
be  associated  with  some  geographic  name  and  rust 
fit  i«to  the  same  logical  form  as  do  the.  other 
entries  in  the  Dictionary  File, 

The  xaze*'  teer  function  can  be  achieved  if  the  geographic  nanaas  are 
described  *.a  tenss  of  their  political  unit  designations ,  These  designations 
are  mutually  exclusive  and  provide  a  tree-structured  hierarchy  of  country, 
province/state,  county,  end  smaller  unit.  The  smaller  unit  could  consist  of 
cities,  towns,  military  installations,  etc.  Additional  geographic  levels 
may  be  added  for  continent,  area  of  a  country,  parts  of  a  state,  etc..  The 
gazetteer  function  allows  construction  of  regional  areas  from  any  group  of 
political  units  which  are  of  the  same  tree  level,  e.g.,  the  countries  which 
comprise  Southeast  Asia,  the  states  which  make  up  the  southwest  portion  of 
the  United  S'  tes,  an^  the  counties  which  constitute  southern  California. 

For  the  proper  operation  of  the  Dictionary  File  all  entries  in  a  given 
MOF  must  have  the  same  number  of  tree-structure  levels,  but  it  is  not  re¬ 
quired  that  all  of  these  have  positive  values;  if  r  appropriate  group  name 
csn  be  assigned  for  a  particular  collection  of.  political  units,  the  group 
designation  is  left  blank.  In  some  instances  several  geographic  3rea  levels 
may  be  constructed  by  nesting  of  mutually  exclusive  political  units.  With 
the  proposed  system  it  is  also  possible  to  construct  geographic  areas  from  a 
subset  of  political  units  which  are  not  mutually  exclusive  with  a  higher 
level  of  political  unit,  for  example,  the  "Delmarva  peninsula"  or  "Rocky 
Mountains."  From  the  viewpoint  of  tree  structure,  the  level  of  such  an  entry 
would  be  both  higher  and  lower  than  the  state  level.  Actually,  as  will  be 
shown,  such  a  data  point  would  be  assigned  to  a  coordinate  within  one  of  the 
appropriate  states  for  mapping  purposes.  For  this  reason  there  would  be 
advantage  in  having  such  terms  as  "Delmarva"  ur  'efined  in  the  Dictionary 
File,  allowing  the  data  analyst  to  re-designate  such  a  data  point  after  it 
was  rejected  in  the  MOD  Storage  Subsystem.  Possible  re-designations  for 
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"Delmarva"  would  include  "Maryland,  Eastern  Shore",  "Virginia,  Eastern  Shore", 
and  "Delaware".  The  geographic  area,  "Eastern  Shore",  would  require  also  a 
state  designation  in  order  to  find  the  proper  dictionary-type  entry. 

Certainly  the  Dictionary  File  will  contain  many  non-unique  geographic 
names.  For  example,  there  are  many  "Washington"  counties  and  several 
"Washington"  cities  in  the  Ur-<ted  States,  and  such  names  must  have  additional 
geographic  designations  in  order  to  make  the  entry  unique.  Ordinarily  the 
geographic  location  of  input  data  will  include  more  than  e-e  designation, 
e.g.,  city,  state,  country,  etc. 

Geographic  input  names  can  be  processed  as  follows  in  the  MOD  Storage 
Subsystem: 

(1)  Each  g  ographic.  designation  is  treated  as  a 
EOF;  LOC  itself  is  treated  as  one  MOF , 

(2)  In  the  TRANSLATE  program  all  dictionary  entries 
which  match  the  alphabetical  spelling  of  each 
such  LOF  are  carried  on  LOF  records  for  the 
data  point. 

(3)  In  the  last  pass  of  the  SORT  TRANSLATED  DATA 
program  all  LOF  records  for  the  location  of 
each  data  point  are  matched. 

Since  the  LOF  code  number  for  each  location  contains  the  LOF  refer¬ 
ence  numbers  for  all  the  higher-level  geographic,  names,  only  one  LOF  code 
number  will  be  consistent  with  the  other  higher-level  reference  numbers, 
assuming  that,  the  geographic  location  was  sufficiently  specified  in  the 
input.  By  this  method  the  geographic  locations  need  be  specified  by  only 
enough  levels  to  be  uniquely  defined.  Obviously,  conventions  would  have 
to  be  defined  so  that,  for  example,  the  single  input  location  entry,  NEW 
YORK,  would  always  be  interpreted  to  mean  NEW  YORK  STATE.  If  the  lc  ation 
designation (s)  were  insufficient  for  unique  interpretation,  the  location 
entry  would  be  listed  as  an  error.  If  desired,  the  several  possible  loca¬ 
tions  for  that  entry  could  also  be  provided. 
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The  grid  function  of  the  Dictionary  File  can  be  accomplished  if  the 
longitude  and  latitude  coordinates  that  are  to  be  used  in  mapping  are 
assigned  for  the  center-of -area  c'  each  defined  geographic  name.  Then 
additional  point  locations  must  be  associated  with  the  center-of-area  point 
and  with  the  geographic  name,  to  represent  the  geographic  extent  of  the  area. 
These  additional  locations  (which  may  be  called  grid  points)  are  required 
for  shaded  maps;  they  can  also  be  used  in  contour  maps.  Grid  points  are 
determined  in  relationship  to  a  grid  size  (the  coordinate  distance  between 
grid  points  in  a  given  geographic  area).  It  is  not  essential  that  all  areas 
of  the  earth  be  given  the  same  grid  size.  Large  bodies  of  water,  deserts, 
and  so  forth  should  also  be  assigned  coordinate^-  so  that  such  areas  can  be 
recognized  and  differentiated  from  areas  likely  to  have  valid  contributory 
disease/environmental  data.  This  type  of  distinction  is  essential  in  areas 
in  which  the  gird  size  is  large  (i.e.,  coarse).  The  designation  of  non- 
applicable  areas  (i.e.,  those  without  valid  contributory  data)  may  be  used 
to  enhance  all  types  of  mapped  output  since  they  allow  differentiation  be¬ 
tween  non-applicable  locations  and  locations  for  which  no  (retrieved)  data 
points  exist.  This  facility  is  considered  essential  "or  computer  production 
of  contour  maps. 

The  gazetteer  function  of  the  Dictionary  File  can  be  fulfilled  la 
exactly  the  same  manner  as  has  beer,  described  for  th  dictionary  functions, 
with  the  exception  that  additional  processing  is  required  to  combine  the 
various  LOT  level  designations  into  one  unique  location.  If  the  longitude 
anu  latitude  coordinate  values  are  used  fcx  the  LOF  reference  number,  rather 
than  somewhat  randomly  created  integers,  these  values  j  ovide  a  basis  for 
the  gird  function  of  the  Dictionary  File.  A  prime  characteristic  of  the 
LOF  reference  numbers  is  that  each  one  must  be  unique  within  the  MOF.  This 
unique  characteristic  of  longitude  and  latitude  coordinate  points  can  be 
realized  if  each  higher  geographic  level  in  tne  MhF  >  assigned  an  addit  ional 
digit  that  is  not  significant  from  the  standpoint  of  geographic  location. 
Thus,  if  the  finest  grid  considered  is  1/10°  for  the  lowest  possible  level 
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of  location  designation,  the  next  higher  level  could  be  designated  in  terms 
of  1/100  , where  the  least  significant  digit  is  not  zero.  These  higher  level 
reference  numbers  would  be  the  approximate  center  point  of  the  geographic 
area  defined.  The  lowest  geographic  level  coordinates  of  each  grid  point 
would  ’  ave  to  be  u-  nually  assigned.  The  higher  level  coordinates  for  those 
points  could  then  either  be  manually  assigned  or  computer  generated.  The 
latter  operation  is  possible  because  the  grid  function  can  be  considered  as 
a  tree  structure  in  which  each  level  is  completely  described  by  the  next 
lower  level. 

There  need  be  and  should  be  only  one  building  and  maintenance  program 
to  satisfy  both  gazetteer  and  grid  functions  of  the  Dictionary  File. 

A  single  program  would  insure  that  all  the  location  data  were  con¬ 
sistent.  It  seems  desirable,  however,  that  the  gazetteer  and  grid  entries 
exist  in  two  different  physical  files  in  the  MOD  system  in  order  to  facili¬ 
tate  their  use.  One  reason  for  this  is  that  the  additional  grid  points 
which  are  defined  to  indicate  the  geographic  area  associated  with  a  location 
name  are  unnecessary  for  the  lowest  designation  in  the  Dictionary  File. 

Using  separate  files,  the  gazetteer  records  would  be  maintained  in  both 
alphabetic  and  numeric  order,  as  would  comparable  records  in  the  Dictionary 
File.  Grid-point  records,  perhaps  stored  aa  a  separate  grid  file,  would  be 
sequenced  by  the  grid  coordinates  of  all  the  component  locations;  they  need 
not  include  the  synonym  and  variant  spelling  entries.  All  bodies  of  water, 
etc.,  could  be  grouped  under  one  tree  level,  and  appear  only  in  this  grid 
file.  If  desired,  there  could  be  a  dictionary  of  body-of -water  names. 

These  names  could  form  a  tree  structure  in  themselves  with  synonyms  and 
variant  spellings  for  the  input  and  retrieval  of  data  concerning  quatic 
environments.  But  such  a  otructure  would  have  to  be  a  separate  branch  of 
the  location  tree-structure  since  the  geographic  locations  of  water  bodies 
cannot  always  be  uniquely  correlated  with  the  MOD  political  unit  boundaries. 
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The  building  and  maintenance  operations  for  the  gazetteer  and  grid 
functions  will  be  accomplished  in  a  manner  similar  to  that  for  any  MOF  in 
the  MOD  Dictionary  File,  although  additional  processing  is  required  to 
create  entries  appropriate  for  these  functions. 


The  gazetteer  and  grid  cards  will  have  the  same  format  and  contents 
as  ordinary  Dictionary  File  cards,  with  the.  following  exceptions: 


(1)  Grid  cards  for  a  location  will  have  no  textual 
description. 

(2)  Each  lowest  location  level  (on  both  gazetteer 
and  grid  cards)  must  contain  a  reference  number 
which  consists  of  its  longitude  and  latitude. 

(3)  The  format  of  the  input  cards  will  have  a  shorter 
field  for  the  textual  description,  the  relational- 
indication  field  will  be  moved  to  the  left,  and 
the  reference-number  field  will  be  longer  to 
accomodate  exception  (2). 


For  example,  location  input  cards  .  iid  the  resultant  Dictionary  File 
records  for  the  State  of  Delaware  might  appear  as  shown  on  the  next  three 
p Ages .  (The  grid  size  has  been  selected  as  1/5  ,  and  higher  geographic 
levels  have  been  omitted  for  brevity.) 
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Dictionary  File 


Textual  Description 

Relational 

Indicator 

Reference  Number  | 

Lcng. 

Lst  • 

DELAWARE 

..  -  - - 

1 

NEW  CASTLE 

2 

WILMINGTON 

3 

-76.6 

+39.8 

ELKTON 

3 

-76.8 

+39.6 

3 

-76.8 

+39.8 

3 

—  /  o  •  8 

+39.4 

3 

-76.6 

+39.4 

KENT 

2 

3 

-76.8 

+39.2 

DOVER 

3 

-76.6 

+39 . 2 

3 

-76.6 

+39.0 

3 

-76.4 

+39.0 

SUSSEX 

2 

OWENS  TRACT  STATE  FOREST 

3 

-76.6 

+38.8 

OWENS  FOREST 

$ 

OWENS  TRACT 

$ 

ELLENDALE  STATE  FOREST 

= 

ELLENDALE  FOREST 

$ 

MILTON 

3 

76.4 

4  38 , 8 

3 

-76.6 

+38.6 

3 

-76.4 

+38.6 

3 

-76.2 

+38.6 

t 


'T 

I 
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CO 

<r 

*H 

rH 

* — t 

rH 

CJ 

H> 

oo 

00 

i — J 

CO 

HD 

00 

OO 

CO 

n. 

00 

Ch 

Ch 

o\ 

00 

oo 

ON 

oo 

CT\ 

oo 

QO 

00 

00 

OS 

CO 

co 

CO 

CO 

CO 

co 

CO 

CO 

CO 

o 

CO 

CO 

CO 

o~ 

+ 

+ 

+ 

+ 

+ 

4* 

+ 

+ 

+ 

4* 

+ 

+ 

rH 

rH 

rH 

o 

o 

CO 

c 

o 

rH 

CH 

rH 

LTl 

H> 

CO 

sO 

HO 

HD 
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These  cards  would  also  result  in  the  following  grid  entries  in  the 
Dictionary  File. 


Name 

Code  Number 

DELAWARE 

-76 

581 

+39 

143 

0 

0 

0 

0 

SUSSEX 

-76 

581 

+39 

143 

-76. 

41 

+38 

71 

n 

0 

-76 

581 

+39 

143 

-76. 

41 

+38 

71 

-76. 

2 

+38. 

6 

-76 

581 

+39 

143 

-76. 

41 

+38 

71 

-76. 

4 

+38. 

MILTON 

-76 

581 

+39 

143 

-76. 

41 

+38 

71 

-76. 

4 

+38 

8 

-76 

581 

+39 

143 

-76. 

41 

+38 

71 

-76. 

6 

+38 

6 

OWENS  TRACT  STATE  FOREST 

-76 

581 

+39 

143 

-76. 

41 

+38 

71 

-76. 

6 

+38 

8 

KENT 

-76 

581 

+39 

143 

-76. 

61 

+39 

11 

0 

0 

-76 

581 

+39 

143 

-76. 

61 

+39 

11 

-7b. 

4 

+39 

0 

-76 

581 

+39 

143 

-76. 

61 

+39 

11 

-76. 

6 

+39 

0 

DOVER 

-76 

581 

+39 

143 

-76. 

61 

+39 

11 

-76. 

6 

+39 

2 

-76 

581 

+39 

143 

-76. 

61 

+39 

11 

-76. 

8 

+39 

2 

NEW  CASTLE 

-76 

581 

+39 

143 

-76. 

72 

+39 

■A 

0 

0 

-76 

581 

+39 

143 

-76. 

72 

+39 

61 

-76. 

6 

+39 

4 

WILMINGTON 

-76 

581 

+39 

143 

-76. 

72 

+39 

61 

-76. 

6 

+39 

8 

-76 

581 

+39 

143 

-76. 

72 

+39 

61 

-76. 

8 

+39 

4 

ELKTON 

-76 

581 

+39 

143 

-76. 

72 

+39 

61 

-76. 

8 

+39 

6 

-76 

581 

+39 

143 

-76. 

72 

+39 

61 

-76. 

8 

+39 

8 
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Aftf  ^he  MOD  data  input  cards  "T"e  translated  fay  the  Dictionary  File, 
all  of  the  input  location  designations  for  a  data  point  will  be  represented 
by  a  single  LOF  code  number  in  the  data  point  record  in  the  Data  File.  This 
LOF  number  consists  of  the  locations  of  the  center  points  of  alx  the  perti¬ 
nent  geographic  groupings  of  the  data  point.  Thus,  the  various  geographic 
levels  may  be  directly  referenced  in  MOD  processing  if  desired,  i.e.,  the 
continent,  country,  province,  etc.,  of  any  data  point  can  be  immediately 
determined.  This  method  of  access  does  not  enhance  retrieval,  however, 
since  requesting  a  given  country  or  province,  in  terms  of  location,  would 
yield  the  same  results.  Direct  access  to  the  geographic  levels  of  a  data 
point  will  be  beneficial  for  some  operations  which  require  manipulation  and 
calculation  or  comoination  operations. 

Each  LOF  reference  number  in  the  LOF  code  for  a  location  actually  con¬ 
sists  of  longitude  and  latitude  coordinates.  Since  the  lowest  (non-synonym) 
level  number  provides  the  most  precise  geographic  location  of  the  data  point, 
it  is  advantageous  to  repeat  these  coordinates  in  each  data  point  record. 

7.3.2  QUERY  REQUESTS 

With  this  representation  of  the  geographic  location  in  the  Data  File, 
any  area  can  be  referenced  in  terms  of  the  longitude  and  latitude  coordinates 
of  its  geographic  name.  Moreover,  the  desired  boundaries  of  a  map  could, 
theoretically,  be  expressed  in  terms  of  retrieval  conditions  or  output  speci¬ 
fications.  Thus  a  map  of  South  American  data  could  be  produced  by  request¬ 
ing  that  the  cxitput  be  "South  America"  (or  "Longitude  S30  to  N15,  Latitude 
W85  to  W30")  or,  more  .recisely,  by  specifying  that  "(LOU)  -  South  America" 
(or  "LON  _>  -85  +  (LON  £  -30  +  (LAT  -30  +  (LAI  +  15)".  However  expressed, 
it  is  obvious  that  only  appropriate  data  points  should  be  considered  in  the 
Retrieval  Subsystem. 

■Vfl  previously  stated,  a  query  request  consists  of:  retrieval  condi¬ 
tions,  synthetic  or  manipulative  operations,  and  output  specifications.  it 
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is  envisioned  that,  after  implementation  of  and  operational  experience  with 
the  MOD  system,  all  three  of  these  aspects  will  be  included  in  a  single 
comprehensive  query  language,  well  suited  for  use  by  the  medical  profession. 
Ultimately,  then,  the  entire  query,  expressed  in  that  query  language,  will 
be  interpreted  by  a  preliminary  program  that  will  identify  and  isolate  the 
processing  requirements  in  terms  of  the  Retrieval,  Synthesis,  and  Output 
Subsystems.  But  until  the  MOD  query  language  is  developed,  each  of  these 
subsystems  will  require  its  own  control-card  input  —  land  the  user  will  have 
to  specify  every  operation  to  be  accomplished  in  each  subsystem.  However, 
the  processing  required  in  the  Synthesis  and  Output  Subsystems  is  often  so 
interrelated  that  completely  separate  control  cards  for  these  two  sub¬ 
systems  would  require  unnecessary  manual  effort  on  the  part  of  the  user; 
furthermore,  it  would  lead  to  errors  of  inconsistency.  For  this  reason 
in  the  inti  im  system  for  query  requests,  it  is  recommended  that  the  request 
control  cards  be  limited  to  two  categories:  (1)  retrieval,  and  (2)  synthesis 
and-ouput.  The  interim  system  can  generate  the  processing  required  in  both 
the  latter  subsystems  from  a  single  request  entry.  Examples  of  a  complete 
set  of  synthesis  and  output  control  cards  will  be  given  after  MOD  system 
output  usage  is  discussed. 

7.3.3  CALCULATIONS 

The  manipulative  or  synthetic  operations  desirable  in  the  synthesis 
subsystem  are  those  which  can  be  utilized  for  both  optional  calcule t ions 
and  required  mapping  calculations.  These  operations  can  be  meaningfully 
performed  upon  any  single-LOF  quantitative  MOF  (i.e.,  a  MOF  whose  LOF's  are 
numbers,  m  contrast  to  a  qualitative  MOF,  whose  LOF's  are  words).  These 
operations  could  include  the  calculation  of  the  total,  maximum,  minimum,  mean 
median,  and  other  arithmetic  combinations  from  the  numbers  contained  in  any 
such  MOF  as  found  in  a  group  of  separate  data  points. 

Some  of  these  calculations  require  (or  can  be  achieved  by)  sorting  the 
p,,;  ..j  Data  File.  The  selection  or  rejection  of  the  greatest  of  least 
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numerical  LOF's  in  a  quantitative  MOF  is  such  an  operation.  Of  course  the 
interpretation  given  to  such  relative  maxinums  and  minimums  varies  with  '  e 
requestea  MOF.  For  example,  in  MOF's  involving  time,  the  greatest  and  least 
values  would  represent  the  most  recent  and  “he  oldest  data  points,  respectively, 
if  the  MOF  is  properly  constructed. 

Calculations  of  maximum,  minimum,  and  averages  could  be  accomplished 
with  additional  processing  ■*  the  Retrieval  Subsystem  for  a  limited  number 
of  MOF's.  It  is  recommended,  however,  that  in  the  initial  development  of 
the  MOD  system, all  statistics  be  calculated  in  the  Synthesis  Subsystem 
because : 

(1;  Not  all  calculations  could  be  performed  at 
retrieval  time. 

(2)  All  or  some  of  the  Calculations  may  be  utilized 
during  the  combination  portion  of  the  Synthesis 
Subsystem. 

(3)  If  these  calculations  are  performed  for  all  data 
points,  the  basic  retrieval  operations  are 
extraneous . 

(4)  Since  other  calculations  may  be  desired  later, 
there  is  advantage  in  (eventually)  designing  a 
general-purpose  calculating  program  rather  than 
modifying  the  (interim)  closed  Retrieval  Subsystem. 

(5)  Control  car.,  tormats  cun  be  much  s implif  1  ed . 

The  output  of  the  MOD  system  can  be  considered  ae  a  summary  of  certain 
characteristics  of  the  MID  data  --  maps  give  a  pictorial  summary,  the  rep  “s 
a  verbal  summary.  The  desired  characteristics  are  located  by  the  Retrieval 
Subsystem  and  then  suim.ii zed  by  the  Synthesis  Subsystem-  Eacn  of  the  sy  - 
thetic  or  manipulative  operations  provides  a  different  type  of  suasaary  and 
can  be  performed  on  any  quantitative  MOF  for  the  entire  set  of  retrieved 
data  points.  Mo  re  .Tver ,  these  operations  can  also  be  performed  Tor  anv  well- 
defined  homogeneous  subset  of  the  retrieved  data.  Consider  these  examples: 

(1)  the  average  number  of  cases  of  a  specific  disease  could  be  determined 
with  respect  to  all  of  the  data  points,  e,»ch  of  which  had  all  the  other 
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desired  characteristics;  (2)  the  average  prevalence  or  incidence  of.  a  parti¬ 
cular  .sease  for  each  given  year  could  also  b.-»  ascertained  from  a  group  of 
data  points  having  the  necessary  elements,  and  this  average  could  eitner  he 
based  upon  all  the  data  points  or  upon  only  those  from  times  when  there  were 
no  epidemics  occurring. 

Illustratioxt  (2),  above,  is  an  example  of  performing  a  calculation  on 
a  single-LOF  quantitative  MGF ,  &&j  A.  for  each  LOF  of  another  MOF,  say  B. 

For  simplicity  of  expression  let  us  represent  this  type  of  operation  by  f (A) : 
(B) ,  where  f  is  any  defined  calculation.  If  the-  calculation  is  to  la  per¬ 
formed  with  respect  tc  the  entire  file,  let  us  denote  this  operation  by  f (A) : 
(#)  for  compatibility.  B  need  not  be  numeric  for  the  operation  to  be  mean¬ 
ingful  since  the  calculation  is  performed  .‘or  each  different  LOF  in  B.  The 
usual  data  processing  technique  employed  to  accomplish  f(A):(B)  is  to  sort 
the  entire  (Retrieved  Data)  File  by  B  and  then  to  calcul. ‘e  f(A)  for  all  A's 
which  have  the  same  S.  .is,  in  effect,  a  separate  calculation  is  performed 
every  time  B  changes.  In  this  calculation,  It  is  assumed  that  there  is  only 
one  MOF  "A*1  and  MOF  :'B"  in  each  data  point  recoid,  hence  f(A):(B)  is  an  inter¬ 
record  calculation. 

It  uM-’y  be  desirable  to  perform  calculations  on  several  single-LOF 
quantitative  MOF's  within  each  data  point  record.  This  is  an  intra-record 
calculation  and  can  be  denoted  by  f (A,B, . . . ,X) . 

In  the  initial  Synthesis  Subsystem  it  is  recommended  that  a  general- 
purpose  CALCULATE  program  be  written  to  compute  only  the  following  calcula¬ 
tions  : 
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Calculation  SuRyec.r.ed  f -designation 


TOTAL 

TOT 

maxim;  M 

MAX 

N-ORELTEST 

MAXN 

MINIMUM 

MIN 

N-LEAST 

MINN 

MEAN 

MEAN 

MEDIAN 

MED 

STANDARD  DEVIATION 

SD 

ADD 

+ 

SUBTRACT 

- 

MULTIPLY 

* 

DIVIDE 

/ 

The  CALCULA'rT  program  would  contain  each  defined  calculation  as  a  sub¬ 
routine,  thus  meaningful  combinations  of  these  calculations  would  be  rela¬ 
tively  simple  to  perform.  However,  an  order-of -operation  or  parenthetical- 
grouping  standard  or  convention  would  have  to  be  established  to  make  these 
combinations  well-defined. 

Since  tne  results  of  these  calculations  auat  be  transmitted  within  the 
system,  we  will  now  consider  their  internal  representation  within  the  system. 

Calculations  which  are  performed  with  respect  to  the  entire  retrieved 
data  file  will  be  contained  in  a  generated  last  record  of  the  data  file. 

Intra-record  calculations  will  be  stored  in  a  new  MOF  of  the  data 
record  and  given  a  MOF  designation  equivalent  to  their  (calculation)  f- 
designation,  also  indicating  what  MOF's  were  involved  in  their  calculation. 
The  resultant  calculation  will  also  be  stored  in  main  memory  for  possible 
utilization  as  an  operand  in  another  calculation. 
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Inter-rec.ord  calculations  can  be  constructed  in  either  of  the  follow¬ 
ing  ways: 

(1)  All  of  the  original  data  point  records  plus  summary  records. 

Each  summary  record  would  contain  f (A)  as  the  LOF  for 
the  HOF  A,  and  would  immediately  follow  the  group  of 
data  point  records  to  which  it  pertained. 

(2)  Summary  records  only. 

In  these  records,  any  subset  of  the  summarized  MOF's, 
i.e.,  all  MOF's  which  were  constant  during  the 
calculation,  plus  f  (A) ,  could  be  written  for  er.'.b 
resultant  calculation.  This  second  type  would  require 
an  additional  specification  to  f(A):B,  which  would 
indicate  those  MOF’s  to  be  included  in  t'.ie  summary 
record. 

7.3.4  SORTING 

The  specification  of  the  HOF  B  is  sufficient  to  acc<  mpiish  f(A):(B). 
Actually,  however,  B  may  be  the  most  minor  MOF  of  several  MOF's  into  whose 
sequence  the  Data  File  was  sorted.  For  the  present,  it  is  suggested  that 
all  of  the  IfcF's  required  for  the  proper  sorting  sequence  be  explicitly 
stated  on  a  sort  card.  In  the  ultimate  MOD  query  language  these  sorts  can 
be  automatically  generated  from  the  calculation  specifications.  In  addition 
to  their  utilization  lor  calculations,  sorts  will  be  effected  in  order  to 
achieve  a  desired  order  in  output  reports.  Unlike  map  output,  the  signifi¬ 
cance  of  printed  alphanumeric  reports  can  be  greatly  enhanced  by  a  provision 
to  vary  the  order  in  which  the  desired  data  points  are  listed.  The  sorting 
does  not  require  an  additional  computer  run  if  the  LOF's  (or  MOF’s)  are  to 
be  printed  in  the  form  of  their  textual  descrintlons .  But  in  this  event, 
th’  Data  File  must  be  matched  against  the  Dictionary  File  to  obtain  the 
proper  descriptions.  This  process  can  best  be  accomplished  if  the  data 
records  are  decomposed  into  MOF /LOF  records  and  sorted,  hence,  the  result¬ 
ant  file,  containing  the  textual  descriptions,  would  have  to  be  re-sorted. 
Only  those  MOF's  which  are  to  be  listed  need  be  decomposed  and  sorted.  The 
MOF’s  co  be  decomposed,  and  their  desired  output  order,  can  both  be  obtained 
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from  the  output  specifications  without  additional  synthesis  control  cards 
in  the  interim  MOD  system. 

The  nature  of  the  MOD  Data  File  requires  that  two  features  of  its 
structure  be  given  special  consideration  in  sorting.  First,  the  variable 
locations  of  each  MQF  in  the  file  are  not  suited  to  standard  sort  programs. 
This  difficulty  can  easily  be  rectified  by  reformatting  the  data  point 
records  sc  that  all  the  MOF's  to  be  sorter  are  placed  in  fixed  locations  as 
they  enter  the  first  pass  of  the  sort  program.  Secondly,  a  procedure  must 
be  established  for  sorting  milti-LOF  qualitative  MOF's.  Although  quantita¬ 
tive  MOF's  cm  certainly  be  defined  so  that  they  are  single-LOF,  it  is  often 
desirable  to  have  nulti-LOF  qualitative  MOF's.  As  the  system  has  been  de¬ 
signed,  all  of  the  LOF's  within  a  single  MOF  will  have  equal  significance, 
thus  only  the  following  two  methods  of  processing  multi-LOF  MOF's  are 
feasible  for  sorting  MOD  data.’ 

(1)  Sort  on  the  first  LOF  for  each  MOF  but  retain  the 
other  LOF's.  The  processing  in  the  MOD  Storage 
and  Retrieval  Subsystems  will  cause  the  several 
LOF's  to  be  sequenced  by  increasing  LOF  code 
number  within  each  MOF. 

(2)  Create  an  entire  new  data  point  record  for  each  of 
the  several  LOF's  within  a  MOF.  All  ot  these 
records  would  contain  the  same  data  plus  a  generated 
flag. 

Either  of  these  methods  could  be  specified  for  each  (qualitative)  MOF 
which  is  to  be  sorted.  For  the  present,  the  user  would  indicate  the  better 
technique  after  considering  the  structure  of  the  MOF  and  its  container  LOF's, 
the  calculations  required,  and  the  type  of  printed  or  mapped  output  desired. 
Both  techniques  could  be  used  in  the  same  sort  program  for  different  MOF's. 
The  general  form  of  so"t  control  card  to  sort  the  file  by  MOF's  A,B,L,  and  D, 
respectively,  could  be  SORT  (A,B,C,D),  where  A,B,C,  and  D  are  the  MOF 
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designations.  In  the  second  technique  for  sorting  nulti-LOF  MDF's,  these 
MOF  designations  could  be  prefaced  by  an  asterisk  If  no  choice  were 

Indicated,  the  first  technique  would  automatically  be  employed.  Often  the 
sort  control  will  be  supplied  from  the  output  specifications,  and  the  same 
convections  can  apply  in  a  print  control  statement.  If  a  single-LOF  quanti¬ 
tative  MOF  is  chosen  for  inappropriate  processing,  the  sort  statement  will 
be  flagged  and  the  option  will  be  ignored  for  that  MOF.  Later  embellish¬ 
ments  to  the  MOD  system  could  assign  the  better  method  for  ecch  MOF  auto¬ 
matically,  and  could  include  options  to  reformat  the  entire  data  point  re¬ 
cord  prior  to  its  being  sorted. 

7.3.5  COMBIN' "IONS 

Thus  far  we  have  considered  summarizing  operations  which  are  op¬ 
tional  for  either  printed  reports  or  maps.  Certain  summarizing  operations 
will  always  be  required  in  order  to  produce  a  meaningful  map  from  the  re¬ 
maining  retrieved  data  points,  however.  These  summarizing  operations  in¬ 
clude  the  following; 

(1)  Making  the  data  point  values  consistent  in  form  so 
that  they  may  be  meaningfully  compared  and  calcula¬ 
tions  performed  on  them. 

(2)  Combining  ail  data  points  which  possess  identical 
LOC’s  (locations). 

(3)  Combining,  then,  all  such  points  which  will  be 
mapped  at  the  same  grid  point. 

Many  possible  ways  of  combining  data  points  can  be  envisioned;  most  of  them 
reduce,  essentially,  to  taking  some  sort  of  average  LOC  (location),  and 
coupling  1*-  with  an  average  of  the  VAL's  (values)  of  the  data  points. 

The  MOD  Data  File  consists  of  data  points  extracted  from  medical  papers 
ip  which  the  degree  of  geographic  significance  will  vary  and  in  which  the 
geographic  areas  will  be  inexact.  There  may  even  exist  data  points  which 
reflect  contradictory  data,  i.e.,  all  their  independent  LOF's/MDF's  and  LOC 
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are  identical  but  the  values  (7AL)  of  the  points  are  different.  Such  contra¬ 
dictory  points  could  be  purged  during  MOD  Storage  Subsystem  processing  or 
allowed  to  remain  in  the  Data  File  and  then  "combined"  in  accordance  with 
the  established  combination  techniques.  It  is  anticipated  the*-  many  of  these 
"extra"  data  points  will  be  eliminated  from  consideration  for  a  specific  map 
output  by  the  retrieval  requests  and  processing.  However,  any  points  for  the 
same  location  which  remain  after  such  processing  must  be  combined  prior  to 
mapping. 

The  geographic  points  which  must  ultimately  be  combined  to  produce  a 
mappable  point  depend  upon  both  the  lowest  level  of  geographic  unit  to  be 
considered  and  the  desired  grid  size.  For  a  given  area  the  user  may  wish  to 
vary  grid  size  since,  as  previously  demonstrated,  different  grid  sizes  can 
produce  dissimilar  maps.  The  size  of  the  area  to  be  mapped  can  also  in¬ 
fluence  the  desired  gild  size  For  example,  maps  of  the  world,  or  of  a 
country,  or  of  a  state  would  probably  be  automatically  assigned  grid-mesh 
sizes  of  1°,  1/2°,  1/10°,  respectively.  Ary  grid  size  larger  than  the 
smallest  grid  size  contained  in  the  location  part  of  the  Dictionary  File  may 
be  constructed  by  combining  all  data  points  at  the  closest  new  grid  point 
location. 

All  of  these  necessary  combinations  of  data  points  can  be  achieved  by 
use  of  th-'  previously  uiscussed  calculation  program  (CALCU^A'5’1')  of  the 
Synthesis  Subsystem.  When  this  program  is  used  to  effect  final  synthesis 
for  mapping  (during  initial  MOD  implementation),  we  suggest  that  no  new 
functions  be  defined  for  this  purpose,  Later,  a  new  algorithm  may  be 
developed  welch  will  optimize  this  operation,  in  which  case  that  algorithm 
can  be  added  to  the  calculation  capabilities  of  the  entire  Synthesis  Sub¬ 
system,  and  can  be  made  the  standard  procedure  unless  another  method  is 
explicitly  specified  by  the  user. 

In  our  discussion  of  the  KDD  gazetteer  and  grid  functions,  it  was  noted 
that  once  the  LOC  (location)  1  id  been  established  for  a  data  point  record, 
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all  defined  geographic  levels  of  this  location  could  be  referenced  if  de¬ 
sired.  These  levels  appear  to  be  a  suitable  criterion  for  the  geographic 
combination  of  data  points.  For  example,  a  map  displaying  the  total  number 
of  cases  of  a  specific  disease  per  province  could  be  produced  by  summarizing 
the  data  point  values  ( VAL)  at  the  province  (PRO)  level.  This  calculation 
could  be  performed  by  the  synthesis  statement  TOT(VAL)  :  (PRO)  after  the 
retrieved  data  points  had  been  sorted  by  location.  A  map  of  the  same  data 
by  county  (CTY),  and  with  the  grid  size  decreased  to  0.5°,  could  be 
achieved  by  TQT(VAL  :  (CTY. 5).  In  this  calculation  any  data  point  whcse 
location  did  not  include  a  county  specification  would,  of  course,  be  omitted 
from  the  total. 

In  the  MOD  System,  the  mappable  value  (VAL)  of  a  disease  data  point 
may  be  represented  in  terms  of  both  absolute  numbers  and  rates  or  percent¬ 
ages,  and  the  final  combination  of  data  points  for  mapping  will  often  in¬ 
clude  the  requirement  that  these  percentages  be  combined.  Although  the 
combination  of  percentages  is  less  well-defined  than  that  for  absolute 
numbers,  various  types  of  such  combinations  can  be  calculated  if  the  values 
and  sample  sizes  of  the  data  points  t)  be  combined  are  known.  For  example, 
50%  and  10%  can  be  combined  to  yield  «.  value  of  10.784%  if  the  respective 
sampling  were  known  to  have  yielded  1  out  of  2  and  10  out  of  100  cas  s 
positive.  However,  the  same  percentages  could  also  have  been  combined  o 
values  of  30%  or  60%  depending  upon  the  combination  technique  employed  and 
the  sampling  situation  involved.  In  the  MOD  system,  the  data  analyst  will 
probably  be  the  one  (initially)  to  specify  the  best  method  of  combin^ig 
pertinent  percentages.  For  example,  the  combination  method  could  be  speci¬ 
fied  by  TOT  (VAL) /TUI' (SAM)  :  (tn.0)  if  VAL  and  SAM  were  the  MOF  designations 
for  Value  and  Sample  Size  respectively,  and  a  grouping  by  province  were 
desired. 

Often,  the  sample  size  for  the  disease  measure  cannot  be  determined 
from  the  data  included  in  the  published  report.  For  this  reason,  MOF's  fcr 
the  largest  and  for  the  smallest  sample  sizes,  (LSZ)  and  (SSZ)  are 
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contemplated.  Ir.  this  event  the  data  analyst  mint  also  specify  the  method 
of  calculating  the  sample  size  to  be  used.  If  in  the  example  given  above, 
an  average  of  the  largest  and  smallest  sample  sizes  were  desired,  the  data 
analyst  woulr  spr  ify  TOT(VAL)  /  TOT (MEAN (LSZ,  SSZ))  :  (PRO).  Of  course  if 
the  sample  size  were  actually  kno  ,,  both  MOF's  would  have  to  contain  the 
same  numeric  value  if  this  calculation  were  to  be  well  defined  for  all  data 
point  records 

7.3.6  SNFaNCEMENT 

For  the  production  of  mops , the  final  combination  operation  rust  re¬ 
duce  the.  MOD  lata  o  longitude-latitude  coordinates  and  values.  This 
coll  _tion  of  poi-  cs  may  then  be  enhanced  by  the  addition  of  other  points 
as  the  final  processing  steo  in  the  Synthesis  Subsystem.  The  techniques  by 
which  :  mtour  and  shaded  maps  will  probably  be  drawn  require  that  those 
points  in  non-applicable  areas  which  fall  within  the  longitude  and  latitude 
range  of  the  n>no  to  be  produced  be  added  in  with  the  set  of  previously  p  o- 
cessed  (retrieved  ai«u  combined)  data  points.  These  points  can  be  added  by 
obtaining  the  non-applicable  locations  from  the  dictionary  (grid)  for  the 
area  to  be  mapped.  .a  addition,  if  a  shading  map  is  to  be  drawn,  those 
grid  points  which  fail  within  the  area  under  consideration,  but  which  are 
not  contained  ir;  the  retrieved  data,  must  be  added  in  order  tha-  the  geo 
graphic  extc  t  of  the  area  be  properly  represented.  Each  of  these  points 
must  be  given  the  same  value  as  the  appropriate  retrieved  data  point,  and 
their  legations  can  be  obtained  by  matching  the  retrieved  data  points 
against  th'*  dictionary  (-rid).  (The  same  technique  could  be  applied  to  eon- 
tour  maps,  but:  it  is  more  common  to  use  o.<Iy  the  center  points  of  each  geo¬ 
graphic  area  for  such  m»'ps.) 

The  inclusion  f  certain  non-applicable  points  could  be  extended  to 
all  type  of  maps  to  provide  «  graphic  representation  of  the  area  boundaries, 
i-.t  the  present  time,  however,  this  process  is  unnecessary  since  these 
ooundariee  would  be  evident  on  the  base  snap  (upon  which  the  MOD  distribution 
map  is  to  be  overlaid). 
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7.4  OUTPUT  SUBSYSTEM 

The  objective  of  the  MOD  System  is  to  display  medical  data  in  terms  of 
geographic  distribution.  In  order  for  the  stored,  retrieved,  and  synthesized 
medical  data  to  have  significance  for  the  user,  the  resultant  data  must  be 
meaningfully  displayed.  The  Output  Subsystem  (see  Fig.  7.6)  provides  this 
display  in  the  form  of  maps  and  reports.  The  Storage,  Retrieval,  and  Syn¬ 
thesis  Subsystems  will  have  produced  internal  records  which  contain  the 
required  data,  thus  the  Output  Subsystem  is  the  least  complicated  both  con¬ 
ceptually  and  structurally.  (Actually,  most  of  the  output  considerations 
had  to  be  formilated  initially  in  order  to  design  the  other  three  sub¬ 
systems.  ) 

7.4.1  REPORTS 

Reports  will  provide  an  alphanumeric  or  verbal  listing  of  selected 
MO”1  data.  One  program  should  suffice  to  accomplish  all  printing  require¬ 
ment.  Printouts  will  normally  consist  of  several  pages.  Each  page  will 
.nclude  a  brief  heading  and  the  page  number,  and,  at  least  the  first  page, 
will  also  contain  the  entire  query  request.  Usually,  the  textual  descrip¬ 
tion  of  each  desired  LOF  will  be  listed,  but  provision  should  also  be  made 
to  list  only  the  LOF  code  number.  This  latter  type  of  report  would  be 
particularly  useful  in  the  early  stages  of  implementing  the  MOD  system  since 
it  could  eliminate  the  additional  processing  required  to  convert  the  data 
file  LOF  codes  Into  LOF  names.  In  addition,  the  user  should  be  able  to 
reque;  t  that  MOF's  be  listed  by  their  textual  description  or  by  their  MDF 
code  designation.  As  previously  mentioned  the  conversion  of  MOF's  or  LOF's 
to  the);  textual  descriptions  must  be  accomplished  ,,-y  the  Synthesis  Sub¬ 
system  to  assure  proper  continuity  within  the  system.  In  either  event 
any  question  marks  associated  with  particular  LOF's  would  be  printed. 

For  additional  flexibility,  the  MOD  output  reports  can  be  furnished 
in  eitner  of  the  following  forms: 
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( ly  F ree  f onu  —  An  entire  data  point  or  any  portion  of  it 
can  be  listed  in  free  form  across  the  printed  page.  If 
the  user  specifies  the  MOF's  to  be  listed,  he  also  indicates 
the  order  in  which  they  are  to  be  printed  from  left  to 
right.  The  first  designated  MOF  will  protrude  to  the  eft, 
for  each  data  point.  If  no  MOF's  are  specified,  all  MOF's 
will  be  listed,  and  they  will  appear  in  the  same  order  as 
in  the  data  point  record  with  the  data  point  number  first 
and  the  narrative  last.  A  MOF  description  will  immediately 
precede  its  contained  LOF's.  If  MOF  code  designations 
are  used,  the  MOF  and  LOF  entries  will  be  listed  in  free 
form  across  the  page.  However,  if  the  textual  descrip¬ 
tions  of  the  MOF  jre  desired,  each  line  will  probably 
contain  only  ore  uOF  and  all  of  its  LOF's. 

(2)  Fixed  form  —  A  tabular  listing  of  portions  of  a  data 
point  is  pointed  on  each  line  of  the  report.  For  this 
type  of  report  the  user  must  not  only  specify  the  MOF's 
to  be  Listed  but  must  also  Indicate  the  maximum  number 
of  characters  which  he  desires  to  be  allotted  for  each 
MOF.  The  MOF’s  will  be  spaced  across  the  page  auto¬ 
matically,  therefore  the  total  maximum  number  of  charac¬ 
ters,  plus  at  least  one  space  between  each  MOF,  must  be 
no  greater  than  the  total  number  of  pri  -t  positions 
across  the  page  (usually  132) .  The  right-most  charac¬ 
ters  of  a  LOF  will  be  omitted  if  its  textual  description 
exceeds  the  allotted  number  of  characters.  The  first 
line  of  each  page  (alter  the  heading)  will  consist  of 
the  Muf  titles  for  each  lOluiai.  Again,  these  can  be 
either  MOF  code  des  ignat  ions  or  textual  descr  *  p-r  ions  . 

One  latter  descriptions  will  also  be  truncated  if  there 
Is  insufficient  allotment  for  character  length. 
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In  fixed-form  reports  only  the  first  LOF  in  a  group  of  LOF's  all 
belonging  to  the  same  MOF  will  be  listed;  all  such  LOF's  will  be  printed  in 
a  free  form  listing.  For  this  reason,  if  the  report  is  to  be  printed  in 
fixed  form,  ordinarily  the  user  should  elect  to  create  multiple  records  in 
the  SORT  program  of  the  Synthesis  Subsystem. 

Calculations  that  were  performed  in  the  Synthesis  Subsystem  will  also 
be  listed  in  output  reports.  But  in  the  initial  system, the  data  analyst 
must  insure  that  all  sorting  and  translation  is  accomplished  prior  to  inter¬ 
record  calculations  in  order  for  the  summary  records  to  appear  after  the 
appropriate  group  of  data  points.  Inter-record  calculations  will  '.<e  printed 
as  separate  lines  in  either  fixed-form  or  free-form  reports.  Intra-record 
calculations  will  appear  with  the  name  of  the  calculation  as  its  MOF  title 
within  the  record  print-out  for  either  type  report,  if  the  calculation  is 
designated  In  the  output  specifications.  Calculations  for  the  entire  set 
of  data  will  be  listed  last. 

7.4.2  MAPS 

Maps  (and  the  very  similar  block  diagrams)  will  provide  a  pictorial 

f 

or  graphic  display  of  the  selected  MOD  data.  In  the  modular  development 
of  the  MOD  System,  experimentation  with  existing  mapping  techniques  and 
selected  subsets  of  data  did  not  proceed  to  the  point  of  yielding  final  or 
definite  data-processing  solutions  to  all  mapping  aspects  and  problems. 

However,  it  is  certain  that  the  production  of  a  finished  map  from  the  syn¬ 
thesized  data  will  (ordinarily)  require  three  steps: 

(1)  Project  the  enhanced  dc-.ta  points  in  accordance  I 

with  the  projection  specified  (generally  the  , 

same  as  that  of  a  base  or  environmental  map  > 

with  which  the  MOD  disease  map  will  be  compared) .  1 

(2)  Grid  the  projected  locations  of  these  points  for  , 

contour  and  shaded  maps .  j 

(3)  Produce  a  map  with  a  plotter.  j 

$ 

| 

i 

* 
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7. 4, 2.1  Projection  After  the  fiua.1  synthesis  and  enhancement  of  the 
data  points,  these  records  will  contain  two  elements:  the  location  (longi¬ 
tude-latitude)  of  each  point  and  the  value  (of  that  point)  to  be  mapped. 

Hat  these  longitude  and  latitude  coordinates  cannot  be  directly  transcribed 
onto  a  meaningful  map  since  these  coordic  ^es  are  actually  two-dimendonal 
locations  on  a  three-dimensional  spheroid.  Haps  are  conventionally  con¬ 
structed  from  the  projections  of  these  coordinate- ,  except  for  maps  of  very 
small  areas.  Among  the  most  commonly  encountered  map  projections  are  the 
Mercator,  Miller  cylindrical  and  Goode's  homoloslne  projections.  Most  of 
these  projections  will  be  useful  for  the  MOD  system  since  each  displays  a 
better  pictorial  characterization  of  earth  areas  and  distances  under  differ¬ 
ent  circumstances.  Moreover,  since  the  MOD-produced  maps  are  to  be  over¬ 
laid  onto  existing  environmental  maps,  various  projections  of  the  MOD  data  j 

i 

will  be  required  to  make  this  data  correspond  spatially  or  areally  to  the  ! 

environmental  data.  Formulae  by  which  the  longitude-latitude  coordinate  j 

values  can  be  transformed  into  X-Y  coordinates  for  any  of  these  map  projec¬ 
tions  are  readily  available  —  in  fact,  there  are  existing  computer  programs 
which  will  perform  most  of  these  transformations. 

7. 4 -2. 2  Griddiug  ‘  Cer  the  hJOD  data  point  locations  have  been  trans¬ 
formed  by  the  appropriate  projection,  existing  computer  mapping  techniques 
require  that  the  MOD  data  be  gridded  to  produce  either  a  contour  or  shaded 
may.  Gridding  consists  ot  constructing  an  array  of  new  points  (the  vertices 
of  regular  polygons)  from  the  existing  points.  These  polygons  are  most 
often  squares,  rectangles,  triangles,  or  hexagons,  and  are  called  grid 
boxes.  These  grid  boxes  usually  are  constructed  to  have  equal  areas,  al¬ 
though  a  variable  grid  size  is  occasionally  used.  Each  point  on  the  new 
grid  is  assigned  a  value  by  interpolating  between  the  values  of  those  data 
points  relatively  near  the  new  grid  point.  In  some  techniques  the  gridded 
area  is  smaller  than  the  original  area,  in  others  it  is  slightly  larger. 

Each  Interpolated  value  can  be  calculated  by  methods  which  range  from  a 
consideration  of  only  two  of  the  original  values  to  those  which  include 
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every  original  value,  and  methods  which  involve  simple  linear  interpolation 
to  complex  non-linear  Interpolation.  Use  of  the  nearest  five  to  eight 
original  data  points  appears  to  be.  optimal. 

The  grid-point  values  obtained  by  these  techniques  give  values  to  the 
closest  integer  suitable  for  the  production  of  contour  maps.  However,  these 
methods  would  have  to  be  modified  for  producing  shading  maps,  providing  for 
each  grid-point  value  to  be  the  same  as  the  original  value  fcr  that  area. 

It  may  be  desirable  to  grid  non-data- valued  locations  in  this  latter  manner 
for  both  contour  and  shaded  maps. 

Grid  criteria  can  be  established  so  that  dot,  shaded,  or  contour  maps 
could  be  produced  on  a  line  printer.  Each  grid  point  would  have  to  be  cne 
of  the  print  locations  on  a  page,  and  the  scale  could  not  be  varied  from 
10  x  6  or  1G  x  8  points  per  square  inch.  Under  these  conditions  contour 
maps  would  have  to  be  rej  asented  by  groups  of  the  same  print  character 
rather  than  lines,  in  which  case  shading  would  consist  oi.  discrete  charac¬ 
ters  rather  than  continuous  symbols. 

7. 4. 2, 3  Production  of  Maps  If  MOD  maps  were  to  be  produced  on  a  printer, 
the  processing  required  to  produce  maps  from  the  projected  and  gridded  MOD 
data  would  be  relatively  simple.  However,  it  is  envisioned  that  MOD  maps 
most  commonly .would  be  drawn  with  an  automatic  (digital)  plotter.  The 
actual  plotting  operation  is  almost  always  accomplished  off-line,  i.e.,  a 
magnetic  tape  is  created  during  the  system  processing,  and,  subsequently, 
this  tape  is  used  as  input  to  the  plotter  device.  The  magnetic  tape  con¬ 
sists  solely  of  a  series  of  X-Y  plotter  coordinate  points  and  an  indication 
of  whether  the  plotter  pen  is  up  or  down  between  these  points. 

The  conversion  of  the  grid  coordinates  to  plotter  coordinates  deter¬ 
mines  the  scale  of  the  resultant  map.  The  plotter  coordinates  are  expressed 
in  X-Y  values  with  accuracy  from  1/100  to  1/500  of  an  inch.  One  inch  on  the 
plotted  map  can  He  equivalent  to  a  varying  number  of  miles  in  different 
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areas  of  the  earth,  depending  upon  the  type  of  projection.  The  scale  of 
miles,  in  terms  of  inch  equivalent,  is  obtained  from  standard  reference 
lines  in  each  projection.  The  desired  scale  must  be  specified  by  the  user 
and  ahould  conform  to  the  scale  of  the  existing  environmental  map  with 
which  the  disease  map  will  be  compared.  The  conversion  of  tl  -  grid  points 
to  any  scale  is  always  a  linear  transformation.  Obviously,  the  maximim 
dimensions  of  the  plotter  page  must  not  be  exceeded.  For  drum-type 
plotters,  one  dimension  can  be  indefinitely  long,  but  both  dimensions  arc 
restricted  if  a  flat-bed  type  plotter  is  used. 

The  plotter  instructions  to  produce  any  desired  legends,  numbering, 
and  register  marks  is  also  represented  on  the  plot  tape  by  a  series  of 
X-Y  coordinates.  For  plotter  efficiency  it  is  recommended  that  the  identi¬ 
fication  and  the  legend  drawn  with  the  map  be  brief,  and  that  lengthy 
groups  of  characters,  uch  as  the  entire  query  request,  be  listed  on  a 
printer. 

A  computer  program  is  necessary  to  create  the  plotter  tape  from  the 
gridded  (or  projected)  data  points.  For  contour  maps  alone,  many  such  pro¬ 
grams  already  exist,  but  each  was  designed  for  a  specific  application  un¬ 
like  that  of  the  MOD  project,  and  we  do  not  yet  know  which  program  would  be 
the  most  generally  suitable  for  MOD  y  rposes.  These  various  programs  pro¬ 
duce  quite  dissimilar  maps  with  the  same  data.  Although  some  existing 
programs  have  worked  well  with  certain  MOD  data,  it  ij  not  fully  apparent 
yet  whether  it  will  be  more  desirable  (ultimately)  to  modify  an  existing 
program  or  to  design  and  implement  an  entirely  new  one. 

Different  programs  will  probably  be  required  to  produce  various  types 
of  MOD  maps.  Processing  methods  and  other  considerations  for  each  type  of 
map  are  as  follows: 

(1)  Dot-type  maps :  In  dot  maps,  the  value  for  each  point  can 

be  appended  to  the  point’s  location.  Zero  values  can  be 
Indicated  to  contrast  with  unknown  values.  Dot  maps  may 
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a^so  be  used  to  illustrate  absence  or  presence  without 
any  indication  of  numerical  values  or  rates.  Each  point 
and  its  value  are  drawn  directly  from  the  projected 
(scaled)  locations. 

Shading-type  maps:  For  shading  maps  indication  of  the 

interval  values  and  the  symbols  which  are  to  distinguish 
the  value  levels  representing  each  such  range  aust  be 
supplied  by  the  user.  Since  these  indications  will  be 
punched  onto  cards,  special  provision  must  be  made  in 
specifying  non-standard  computer  characters  for  the 
shading  symbols.  A  common  plotter  practice  is  to  pre¬ 
define  these  non-standard  symbols  by  -.umbers  or  short 
words.  These  numbers  or  words  are  then  used  in  the 
symbol  designations.  Later, the  symbols  can  be  assigned 
automatically  In  a  standard  sequence  of  increasing 
density.  Nan-applicable  areas  could  appropriately  be 
assigned  a  special  symbol.  The  map  wou  be  drawn  from 
the  (scaled)  grid  points.  Each  grid  box  could  be  shaded 
individually.  Alternatively,  adjacent  boxes  possessing 
the  same  shading  value  could  be  shaded  at  one  time. 

(This  would  require  some  additional  processing,  but  would 
substantially  speed  the  plotting  operation.) 

Contour-type  maps:  The  desired  contour  intervals  must 

be  provided  by  the  user.  Eacw  interval  could  have  an 
equal  Increment  or  each  increment  could  be  explicitly 
requested.  The  usual  contour  technique  is  to  draw  all 
.he  appropriate  contour  lines  wi  'in  each  grid  box,  one 
at  a  time,  and  then  proceed  to  the  next  grid  box. 
Provision  must  be  made  to  end  the  contours  at  user- 
specified,  non-applicable  locations  obtained  from  the 
Dictionary  File.  The  values  of  the  contour  lines  could 
be  Indicated  on  ^ne  map  (Shading  and  contour  intervals 
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for  certain  data  eight  be  too  small  to  oe  read. 

Such  a  situation  could  be.  determined  prior  to 
mapping  and  remedied  either  by  terminating,  with 
an  appropriate  message,  to  the  user,  or  by  auto¬ 
matic  selection  of  a  mere  suitable  type  of  map.) 

(4)  Combination-tvpe  maps;  Some  combinations  of 

these  types  can  often  produce  more  meaningful  maps 
than  can  a  single  type.  A  contour  map  which  also 
indicates  the  original  data  points  would  be  of 
value  where  the  contours  were  a  better  representa¬ 
tion  of  •  A  contour  map  in  which  the  areas 

between  contour  lines  were  shaded  would  graphically 
relate  similar  values  and  distinguish  peaks  and 
valleys.  It  is  theoretically  possible  to  combine 
meaningfully  shading  and  dot  maps,  but  there  are 
technical  limitations  since  their  representation 
by  the  plotter  would  often  be  unreadable. 

7.4.3  MULTIPLE  OUTPUT 

A  ma"  presents  a  pictorial  description  cf  the  retrieved  data,  but 
this  portrayal  is  limited  to  location  and  value.  It  would  often  be  de¬ 
sirable  to  augment  a  map  with  a  verbal  description  of  some  pertinent  M»F 
associated  with  each  data  point.  A  report  accompanying  the  map  could  de 
scribe  the  data  points  in  terms  of  the  summarized  points,  its  component 
points,  or  both  —  or  the  narrative  accompanying  its  component  points. 
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ABSTRACT  -  This  section  discusses 
operational  procedures  and  considers 
hou  the  MOD  system  can  be  most  effec¬ 
tively  used.  In  the  "Notes  to  user ", 
inherent  limitations  of  the  map  form 
as  a  means  of  presenting  information 
vU'c  dieuusmdj  u^so  restrictions  im¬ 
posed  by  the  data  base.  Potential 
app1 *  cations  of  the  MOD  system  are 
considered,  and  several  examples  are 
given. 


"Tv  Interpretation  of  kmwledge  must  take 
ignorance  into  account." 

Professor  Levy 
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The  (implemented)  MOD  system,  including  its  data  bank,  is  simply  the 
means  to  an  end.  It  must  be  used  effectively  to  give  insight  into 
di8ease/environmental  situations,  helping  the  user  to  arrive  at  informed 
decisions  which  will  lead  to  appropriate  action. 


THE 

MOD 

SYSTEM 

— 

PAST  PRESENT  FUTURE 


The  true  purpose  of  knowledge  resides 
in  the  consequences  of  directed  action. 

John  Dewey 


Continuing  experience  with  the  operational  system  will  be  necessary 
to  reveal  all  of  the  ways  in  which  the  MOD  system  can  be  used  effectively. 
Obviously,  the  details  of  such  usage  cannot  be  given  now,  but  as  a  guide 
to  our  development  of  the  system,  a  basic  pattern  of  output  usage  was 
formulated. 
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8.1  OPERATIONAL  PROCEDURES 


The  MOD  system  Is  unique  and  sufficiently  different  from  other  systems 
to  require  detailed  instruction.  This  instruction  will  be  provided  by  a 
"user's  manual"  which  will  explain  the  language  and  the  detailed  procedures 
for  operation.  The  major  steps  are  as  follows: 

(1)  User  conceives  of  an  idea  or  hypothesis  that  he  wants 
to  test  with  the  MOD  system. 

(2)  User  writes  out  a  rough-draft  preliminary  query,  in¬ 
cluding  retrieval  conditions  consisting  of: 
disease/environmental  factors,  and  geographic 
locationa/areas,  and  synthesis,  and  output  specifica¬ 
tions  for  the  kind  of  map  desired. 

(3)  Data  analyst,  in  conjunction  with  user  and/or  data 
consultant,  rephrases  query  in  terms/format 
acceptable  to  MOD  system. 

(4)  Query  is  keypunched. 

(5)  Query  is  batched  with  others  and  fed  into  system. 

(errors  are  returned  and  corrected,  then  re-entered 

by  using  procedures  previously  outlined  in  steps  3  or  4.) 

(6)  MOD  a;,  stew  retrieves  data  points  from  Data  File, 
manipulates  them,  mid  produces  maps  (sometimes 
accompanied  by  supplemental  reports),  "*,<~h  showing  the 
distribution  of  the  areal  variations  of  one  disease/ 
environmental  factor. 

(7)  User  takes  the  maps  and  compares  them,  ordinarily  by 
overlaying  them  on  each  other  and  on  (published/drawn) 
base  maps  (taken  from  the  MOD  map  library)  to  determine 
pattern  lit,  including,  perhaps,  variations  in  pattern 
related  to  year,  season,  etc.,  etc. 

(8)  User  observes  new  interrelationships  (not  new  data) 
and  gains  new  perspectives  and  increased  understanding 
of  the  ditiease/enviiomat.ieai  situation  (e,g.}  salient 
disease-environmental  relationships  discovered/ 

conf irmed/disproved  and/or  pertinent  modifications 
that  need  to  be  made  in  data  collertlon/data  files 
and/or  better  waya  to  phrase  the  old  query  or  to 
fornulate  a  new  query  in  order  to  generate  additional 
Inf ormat ion) . 


continued  next  page 
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(9)  "ser  draws  conclusion,  makes  (informed)  decision(s) 
and  initiates  whatever  action(3)  is  deemed 
desirable /necessary , 

Figure  8-1  illustrates,  in  schematic  fashion,  'he  various  steps 
that  are  followed  in  the  M)D  system:  collecting  data,  preprocessing  it 
for  computer  input,  manipulating  it  in  response  to  query,  and  outputting 
if  rs  information  in  accordance  with  users  specifications. 


bitfure  Overall  pattern  of  M)D  system  usage. 
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8.2 _ NOTES  TO  USER 

In  Che  MOD  computerized  system  it  is  the  user,  not  the  system,  who 

J 

makes  correlations  between  the  raw  data  and  output  map,  evaluating  the  .  j 

various  factors  which  make  .he  map  look  as  it  does.  The  computer  system  j 

will  not  perform  analysis  of  the  maps  produced  nor  will  it  make  judgments;  I 

it  will  merely  manipulate  (according  to  rigidly  defined  algorithms)  ex-  j 

traded,  formatted  data  (from  that  pool  of  data  which  was  pre/iously  put  ] 

into  Che  system)  and  output  these  manipulated  data  in  the  form  of  maps  or  ] 

other  reports,  in  the  manner  specified. 

As  has  been  discussed  before,  it  is  the  mandatory  responsibility  of  j 

the  user  to  understand  maps  and  their  use,  in  general,  before  attempting  ! 

to  rerpret  specific  maps  produced  by  any  system.  The  potential  user  of  j 

the  MOD  system  will  require  considerable  orientation  and  training  in  three 
areas:  logic  (to  pose  the  query);  cartography  (to  understand  what  a  map  j 

is,  what  it  can  do,  etc.);  and  the  biomedical  disciplines  (to  understand 
the  limitations  of  data,  including  what  kinds  can  and  cannot  be  manipulated 
and  mapped).  The  effective  use  of  maps  Involves  comnecence  on  the  part  of 
both  the  compiler  and  the  reader  with  respect  to  three  fundamental  factors: 
an  understanding  of  mac  scales,  how  to  determine  position,  and  how  to 

present  the  data  in  a  form  that  can  be  readily  assimilated.  ! 

For  example,  various  situations  may  all  result  in  similar-appearing  j 

blank  areas  on  a  disease-distribution  map.  The  MOD  system  user  must,  be 
aware  of  several  possible  causes  of  such  blank  areas  if  he  is  to  interpret 
the  map  correctly.  Seme  of  these  possible  causes  are: 

(1)  The  disease  was  looked  for  but  found  to  be  absent.  (Ideally,  j 

this  is  what  all  blank  areas  should  indicate,  but  this 

ideal  is  a  very  long  way  from  fulfillment.) 

(2)  The  disease  never  been  looked  for,  or  diagnosed, 

or  reported  i .om  that  region  (but  may  be  present  the»e) . 

(3)  The  disease  is  present  but  has  been  incorrectly 
diagnosed  and  rep  rted  as  something  else. 

<  The  region  mapped  is  uninhabited.  ] 
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Before  awaking  his  query  Che  potential  user  of  the  MOD  system  should 
oe  requited  to  check  the  MOD  Map  Library  catalog  to  see  whether  or  not  the 
map  he  wants  has  already  been  requested  and  ouput  for  a  previous  worker. 

We  suspect  that,  in  the  study  of  disease  and  environmental  situations, 
assuming  full  operation  of  the  MOD  system,  most  of  the  disease  maps  will  be 
computer-produced,  while  most  of  the  environmental  maps  will  either  be  found 
already  published  in  suitable  form  or  will  be  manually-produced  from  data 
presented  in  books,  atlases  or  major  reports. 

When  the  user  (usually  a  professional  biomedical  person)  compares  the 
maps,  he  will  do  so  visually  on  a  trial  and  error,  i.e.,  subjective  basis. 

For  example,  he  will  look  at  ana  compare  a  map  showing  the  distribution  of 
leptospirosis  with  another  map  showing  the  distribution  of  rainfall  in  the 
same  area  and  conclude,  perhaps:  ''leptospirosis  is  related  to  rainfall". 

His  basic  operating  assumption  is  that  a  similarity  of  distribution  patterns 
on  maps  implies  some  relationship  among  the  factors  mapped. 

Maps  produced  by  the  computer  can  be  output  on  transparent  material 
which  can  then  be  combined  (matching  geographic  points)  with  other  factor 
maps.  The  practical  limit  to  the  number  of  overlays  is  probably  quite  low, 
for  the  whole  purpose  of  this  type  of  data  processing  Is  to  simplify  the 
situation  being  considered  so  that  relationships  are  clarified.  If  the 
patterns  exhibited  by  the  disease  and  environmt  tal  factors  are  similar 
(i.e.,  they  match),  some  relationship  can  be  assume^  between  ,he  disease 
and  environmental  factors.  However,  only  further  study  can  determine  the 
nature,  of  that  relationship  —  whether  it  is  car'  al  or,  merely  associative. 

Figure  8-2  illustrates  the  various  types  of  such  relationships. 

Other  ways  by  which  existing  maps  could  be  compared  with  data  con¬ 
tained  in  the  MOD  system  data  flies  involve  manipulating  the  existing  map: 

The  data  it  contains  could  be  digitized  and  input  to  the  computer  files.  ; 

l 

i 
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TYPES  OF  RELATIONSHIPS  AMONG 
DISEASE  AND  ENVIRONMENTAL  DATA 

EHRECT  CAUSAL  “  _ 

RELAY  I  ON  SH I P 

^  d>  ^ 

'■  INDIRECT  CAUSAL” 

RELATIONSHIP 

V  0 - Q 

^-<5) — 

INCOMPLETE  OR  PARTIAL 
v  CAUSAL  RE  L  AT  I  ON  S  H  I  P 

y  ©czr - r®  r 

_ 

ASSOCIATIVE 

V  RELATIONSHIP  - 

t  © - ;®  n 

^ACCIDENTAL 
RELATIONSHIP  _ 

©- - ® 

UAAAAMM*l*AMAA^UUUUAM«JJUM*/ 

_ _ V© . . -©^ 

®  •  DISEASE  (DATA)  UNDER  STUDY 
©  -ENVIRONMENTAL  FACTOR  < DATA!  UNDER  STUDY 

©•OTHER  ENVIRONMENTAL  FACTORS  NOT  DIRECTLY 
UNDER  CONSIDERATION 

-—••DIRECT  CAUSE-AND-EFFECT  RELATIONSHIP 
—  -APPARENT  (OP  DETECTED  OR  SUSPECTED)  RELATIONSHIP 
- NO  RELATIONSHIP  EXISTING  BETWEEN  FACTORS 


Figure  8-2  Types  of  relationships  among  disease  and  environmental  data. 
The  eye  of  the  observer  is  evident  at  the  left.  Those  connections  above 
the  surface  are  readily  seen;  those  below  the  surface  are  not. 
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The  map  could  be  redrawn  manually  to  a  different  scale  and  photographically 
reduced  or  expanded  to  make  a  new  hard  copy  cc  the  appropriate  ecale.  A 
relatively  simple  and  inexpensive  way  of  comparing  maps  of  the  same  projec¬ 
tion,  but  different  scale, is  to  make  (photographic)  transparencies  and  project 
each  of  these  on  the  same  screen  simultaneously,  using  separate  slide  pro¬ 
jectors,  adjusting  projector  distances  so  that  the  maps  superimpose.  Al¬ 
ternatively,  the  MOD  map  could  be  manipulated,  the  da*"a  existing  in  the 
MOD  data  files  could  be  mapped  on  a  new  projection,  scale  or  other  basis 
to  fit  the  base  map. 

Overlaying  and  visual  pattern  comparing  is  a  very  powerful  process 
because  it  permits  human  de.tection  of  relationships  which  are  so  complex 
that  standard  mathematical  methods  would  be  unable  to  detect  them.  Used 
in  conjuuction  with  a  computer,  the  process  of  map  preparation  is  greatly 
improved,  as  the  user  can  get  an  up-to-date  map,  i.e.,  distribution  pattern 
(as  far  as  recorded  data  is  concerned)  within  a  few  hours.  The  user  might 
want  to  ‘'clean  up"  manually  parts  of  a  computer-produced  map,  but  this  is 
relatively  simple  as  compared  to  preparing  the  whole  map. 

8.3  POTENTIAL  APPLICATIONS 


It  is  appropriate  once  again  to  emphasise  that  the  major  objective  of 
the  MOD  project  is  to  develop  a  system  whereby  narrative  and  tabular  data 
can  be  collected  and  preprocessed  (formulated)  so  that  they  are  suitable 
for  subsequent  computer  processing  and  output  in  the  form  of  distribution 

maps,  graphs,  tables,  and  narrative.  Although  the  self-imposed  limitations 

t 

j  described  previously  (input  data  has  mainly  concerned  the  ecology  of  schisto- 

j  somiauis  and  leptospirosis)  narrow  the  limits  of  specific  output  considered 

j 

<  in  this  report,  they  do  not  narrow  the  potential  limits  of  the  system.  The 

i  system  has  been  designed  to  meet  certain  needs  for  information  dealing  with 

I  infectious  disease,  however,  the  same  system  could  be  used,  with  little 

|  modification,,  to  analyze  the  ecologic.  factors  which  influence  efficient 

f  stockpiling  of  com  or  aluminium,  or  the  ecologic  factors  which  influence 
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efficient  forest  preservation  or  development  of  recreational  facilities,, 
or  the  ecologic  factors  which  influence  efficient  development  and  location 
of  community  blood  banks  or  Medicare  treatment  centers,  etc.  etc.* 


The  value  of  a  computer  system  that  allows  rapid  presentation  of 
current  or  historic  disease/environmental  information  in  tabular,  graphic, 
or  map  form  is  so  obvious  that  it  requires  no  elaboration.  But  there  are 
other,  less  obvicxis  uses  of  the  MOD  system. 

■>»-  For  particular  diseases,  in  relation  to  particular  geographic  areas, 
the  MOD  system  can  provide  a  very  valuable  research  tool  since  it  makes 
possible  the  rapid  presentation,  in  a  vivid  way,  of  relationships  among 
disease,  per  se,  man,  and  his  environment  so  that  the  ecology  of  disease 
becomes  more  clearly  evident,  and  causally  related  factors  more  readily 
apparent. 

— Through  correlation  of  many  causal  factors  in  relation  to  the  current 
situation  and  the  recent  past,  the  MOD  system  would  enable  the  user  to 
determine  trends  and,  in  this  way,  to  get  a  reasonable  perspective  of  what 
the  future  might  be. 


~ The  MOD  system  provides  much  insight  into  the  minimal  requirements  of 
disease  data  in  order  that  these  data  can  be  computer  processed.  Thus  the 
system  becomes  helpful  In  preparing  data  extraction  forms  for  any  disease/ 
environmental  situation.  In  this  way,  tne  MOD  system  can  give  excellent 
support  for  anyone  wishing  to  carry  out  a  prospective  study.  It.  can  also 
be  of  considerable  value  in  suggesting  ways  to  evaluate  data  already  on 
hand,  and  in  determining  the  feasibility  of  a  retrospective  study. 


*  Very  recently  AID  (Agency  for  International  Development)  has  expressed 
great  interest  in  the  MOD  system  as  a  means  of  identifying,  and  character¬ 
izing,  and  locating  (on  maps)  those  disease-environmental  situations  which 
would  probably  interfere  seriously  with  proposed  schemes  for  economic 
development  of  several  Latin  American  Countries. 


8.  Output  Usage 


— As  an  Important  by-product,  the  mere  existence  of  a  computer  system 
which  can  manipulate  disease  environmental  data  to  yield  valuable  informa¬ 
tion  will  provide,  an  important  stimulus  to  get  more  and  better  data  —  data 
that  are  more  nearly  complete  as  well  as  more,  accurate.  Furthermore,  the 
MOD  system,  by  pointing  out  "bare  areas"  in  the  aata  pool,  will  direct 
attention  where  it  is  most  needed. 


There  are  two  principal  ways  in  which  the  MOD  system  could  be  used  to 
investigate  causal  (ecologi  c)  relationships  in  infectious  diseases.  First, 
one  could  take  a  set  of  variables,  the  values  of  which  were  actually  re¬ 
corded  in  relation  to  a  particular  disease  situation,  then  determine  the 
relationships  which  did  exist.  Alternatively,  one  could  select  a  number  of 
variables  thought  to  be  important,  then  alter  these  (systematically)  to  see 
if  the  information  ouput  was  consistent  with  what  might  reasonably  be 
expected,  i.e.,  whether  or  not  the  results  made  medical  sense.  Obviously, 
both  of  these  approaches  have  their  place: 

(1)  To  take  what  did  happen  and  try  to  determine 
why  (in  the  sense  of  identifying  dependent 

variables) . 

(2)  To  develop  a  hypotnetical  situation  and  attempt 

to  predict  what  might  happen  under  those  conditions. 


Many  specific  kinds  of  questions  could  be  put  to  the  MOD  system,  for 
examp ie : 

•  Given  particular  environmental  changes,  what  changes 
in  incidence/character  of  a  specific  disease  are  apt 
to  occur? 

•  Given  the  past  history  and  broad  trends  of  a  particular 
disc" ^/environmental  situation,  what  is  the  likelihood 
that  major  variations  in  incidence  (i.e.,  epidemics) 
will  occur  within  the  foreseeable  future? 

•  Given  particular  changes  in  a  disease  situation,  what 
specific  environmental  factors  might  have  caused  or 
influenced  these  changes? 

continued  next  page 
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•  Given  several  environmental  factors,  which  one(s) 
are  most  likely  to  influence  distribution  of 
various  animals  which  may  act  as  intermediate  hosts 
or  reservoirs  of  disease  or  —  on  the  other  hand  — 
which  may  yield  economically  valuable  products  such 
as  pearls  or  furs,  or  food? 

•  Given  several  different  diseases,  what  interrelation-  $ 

ship,  x£  any,  exists  among  them?  tor  example,  among  I 

protein  malnutrition,  iron  deficiency,  tuberculosis,  I 

and  hook  worm  infection  or  between  influenza  and  I 

(subsequent)  pulmonary  emphysema,  etc.  I 


Obviously,  the  output  of  the  MOD  systei  is  "information,"  information 
directed  primarily  toward  helping  bio-medics'  scientists: 

(1)  Appreciate  more  fully  quantitative  aspects  of 
disease/environmental  data  in  relation  to 
place  and  time. 

(2)  Identify  the  multiple  causal  factors  of  a  given 
disease  and  their  interrelationships. 

(3)  Determine  interrelationship  if  any,  among 
several  different  diseases  or  conditions 
occurring  together. 

(4)  Evaluate  the  impact  of  the  disease  upon  socio¬ 
economic  aspects  of  the  area,  military 
operations,  etc.,  etc. 

(5)  Anticipate  the  ej f ects  of  altered  ecology  on 
incidence  and  manifestations  of  disease. 

(6)  Predict  variations  in  incidence  and  changes 
in  character  of  disease  that  are  likely  to 
occur  in  the  foreseeable  future  (on  the  basis 
of  past  history  and  trend  analysis'. 

3.4  EXAMPLES 


In  developing  the  MOD  system,  operation  was  simulated  using  real  data, 
data  that  reflected  realistic  situations.  Many  of  these  operations  were 
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limited  simply  to  mapping  incidence  of  a  specific  disease,  but  other,  more 
complex  situations  have  also  been  explored,  as  illustrated  by  the  following 
examples . 


Data  concerning  *"hc  distribution  of  Burkitt's  tumor  has  been  re- 
di«.wn  into  the  form  in  which  it  would  be  output  by  the  MOD  system  (Fig. 8-3). 
One  base  map  of  Africa  is  shown  (Fig.  8-3A) ,  on  which  two  other  ma>s  may  be 
superimposed.  One  of  these  maps  tFlg.  8— 3C)  shows  the  occurrence  cl 
Burkitt's  tumor  —  the  other  (Fig.  8-3B)  shows  those  regions  in  Africa  where, 
similtaneously ,  the  altitude  is  under  5,000  feet,  the  seasonal  mean  temper¬ 
ature  exceeds  60  F,  and  the  total  annual  rainfall  exceeds  20  Inches.  When 
these  maps  are  overlaid  they  give  the  appearance  shown  in  Fig.  8-3D. 

Data  dealing  with  the  distribution  of  goiter  and  the  iodine  content 
of  water  in  the  United  States  provide  a  second  illustration  of  these  tech¬ 
niques  (Fig.  8-4) . 

For  a  third  example  of  MOD  system  usage,  we  returned  to  the 
standard  set  of  schistosomiases  data  used  previously  in  testing  the  various 
computer-mapping  programs.  Again,  we  emphasize  that  this  example  is  offered 
only  to  illustrate  technical  aspects.  With  the  restrictions  imposed  by  the 
limited  data  being  used,  one  must  not  draw  firm  conclusions  about  the  dis¬ 
ease-environmental  relationships. 

Assuming  that  a  user  is  interested  in  the  relationships  among  infection 
rate  of  schistosomiasis,  rainfall,  and  temperature  in  eastern  Brazil.  He 
can  ask  for  separate  maps,  each  showing  one  of  these  factors  (Fig.  8-5A,B,C), 
then  overlay  them  (Fig.  8-5D)  to  compare  their  distribution  patterns.  From 
this  it  appears  that  July  normal  temperature  does  not  influence  the  infection 
rate  of  schistosomiasis,  but  that  total  annual  rainfall  may. 

Because  of  the  way  in  which  the  MOD  files  and  programs  are  set  up,  the 
user  may  query:  "What  is  the  infect 'on  rate  (Z)  of  schistosomiasis  due  to 
Schistosoma  mansoni  in  man,  in  eastern  Brazil,  where  (simultaneously)  the 
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[A]  ® 


Figui'e  8-3  Data  concerning  Burkltt 's  tumor 
(Burkltt,  1962,  p.  77-78;  used  by  permission) 
recast  into  a  MOD-iike  output  form:  A,  a  base 
map  of  Africa;  B  and  C,  maps  which  would  be  out¬ 
put  by  the  MOD  computer  system  to  show:  in  B, 
areas  (shaded)  where  these  three  conditions 
exist  simultaneously  --  altitude  is  under  3000 
feet,  seasonal  mean  temperature  always  exceeds 
60°F,  and  total  annual  rainfall  exceeds  20 
inches,  and,  in  C,  occurrence  (dots)  of  Burkltt ' s 
tumor.  D,  shows  maps  A,  B,  and  C  overlaid  to 
evaluate  the  extent  of  pattern  match. 
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Figure  8-4  Data  implying  relationship  between 
goiter  and  4  ?-Une  content  of  drinking  water  — 
Henachen,  1962,  p.  190,  about  1920  --  rearranged 
into  MO'J-like  format.  A,  a  base  man  of  U.S.; 

B,  a  MOD  map-llke  output  showing  areas  (shaded) 
with  iodine  content  of  drinking  water  low  (less 
than  0.2J  parts  per  liter);  C,  a  MOD  map-like 
output  showing  areas  (shaded)  with  goiter  fre¬ 
quent  (5  or  more  cases  per  1000);  D,  maps  of 
A,  B,  and  C  overlaid  to  show  similarity  of  the 
distribution  patterns  of  the  two  factors. 
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-  See  opposing  page 

c igure  6-5  Maps  showing  distribution  of  temperature,  rainfall,  and 
schistosomiasis  data  in  eastern  brazil:  A,  July  normal  temperature,  °F, 
(Rand  McNally,  1964,  p.  11);  B,  Annual  rainfall,  inches,  (Rand  McNally, 
1964,  p.  97);  C,  Infection  rate,  X,  of  schistosomiasis  mansoni  in  man, 
based  upon  data  grouped  by  province  (taken  from  Malek  in  May,  1961,  p. 
305-6),  drawn  manually  by  the  MOD  study  team;  D  is  a  map  made  by  over¬ 
laying  A,  B,  and  0 ,  (70°F  contour  from  A  is  shown  as  a  dotted  line;  60- 
inches  rainfall)  contour  from  B  is  shown  as  a  solid  line;  0,  10,  20,  and 
30%  (infection  rate)  contc.ju^  Ircm  i.  are  represented  by  dashed  lines);  E, 
Dot-type  map  showing  data  points  that  would  have  been  retrieved  and  output 
by  MOD  system  in  response  to  query  asking  for  the  combined  factors  — 
infection  rate  of  echis toaomiae is  it  i-.soni  in  nan  -jhere,  simultaneously, 
July  noi’iai  temperature  is  under  70°i  and  total  annual  rainfall  exceeds 
60  inches-,  F,  cor.  tour-type  map  drawn  from  data  points  of  E. 
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annual  rainfall  exceeds  60  inches  and  Che  July  normal  temperature  is  less 

than  70°F?"  In  this  cas,.  he  would  receive  a  s  ngle  map  ('’ig.  S-5E,F),  based 

only  upon  those  data  points  which  satisfied  ail  the  query  conditions.  This 

single  map  presents  a  distribution  pattern  which,  when  compared  with  the 

three  separate  maps  (Fig.  8-5A,F,C),  gives  little  insight  into  the  over-all 

disease/environmentai  situation,  nevertheless  it  describes  a  particular 

situation  and  does  present  potentially  useful  information.  j 

The  user  could  also  request  graphs  (discussed  earlier  under  Output 
Analysis)  showing  either  schisc  -omiasis-rainf all-temperature  or  schisto¬ 
somiasis-rainfall  plots.  But,  again,  it  seems  that  relationships  among  the  : 

disease  and  environmental  factors  are  most  effectively  shown  (at  least  in 
the  early  stages  of  an  investigation)  by  obtaining  and  comparing  visually 
a  group  of  separate  maps,  each  displaying  the  geographic  distribution  of 
one  simply-stated  factor. 

A  fourth  illustration  makes  use  r.f  some  paleontological  taxonomic 
and  ecologic  data  (Ray,  1967)  to  explore  a  problem  quite  remote  and  far 
afield  from  the  basic  medical  objectives  of  tne  MOD  project.  One  of  the 
reasons  for  this  was  to  demonstrate  that  the  MOD  vstem  is  applicable  to 
many  areas  other  than  the  study  of  disease.  Data  which  had  actually  been 
used  in  a  study  employing  maps  was  recast  into  MOD-like  output,  then 
examined,  leading  to  the  same  conclusions  that  were  drawn  by  the  original 
worker. 

The  first  illustration  (Fig.  8-6)  shows  a  base  map  of  the  East  coast 
and  four  maps,  each  showing  only  the  geographic  distribution  of  one  environ-  I 

mental  factor.  (Each  map  was  originally  drawn  on  translucent  overlay  paper.) 

Fossil  walrus  tusks  of  uncertain  age,  but  possibly  at,  old  as  several 

i 

million  years,  have  been  found  along  the  East  coast  from  New  England  to 
Florida.  if  ai,  these  tusks  are  of  Pleistocene  (Ice  Age'  age  and  represent 
the  living,  cold-water  species  of  walrus,  it  would  seem  that  cold  climate 
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Figure JJl  MOD-type  maps,  each  showing  the  distribution  of  one  environ 
mentai  factor  pertinent  to  study  of  fossil  walruses;  see  Figure  8-7. 
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extended  as  far  south  as  Florida.  However,  much  other  evidence  indicates 
that  Florida  was  only  slightly  cooler  during  the  Ice  Age  than  at  present. 
Thus,  there  is  a  dilemma . 

Close  examination  of  the  fossil  walrus  tusks  shows  that  two  morpho¬ 
logically  different  kinds  occur,  and  that  each  occurs  in  a  different  geo¬ 
graphic  region  (Fig.  8-6  and  -4,  -5). 

Since  the  time  when  walruses  became  evolutionarily  distinct  from 
seals,  the  East  Coast  has  been  submerged  by  marine  waters  twice:  once  dur¬ 
ing  the  Late  Miocene  (12-17  million  years  ago),  ard  again  during  the  Pleisto¬ 
cene  (10,000-2,000,000  years  ago).  Studies  of  the  deposits  laid  down  during 
these  submergences  allow  us  to  map  the  shorelines  of  these  ancient  seas 
(Fig.  8-6  and  -2,  -3). 

When  these  shoreline  maps  are  overlaid  with  the  tusk-occurrence  maps, 
it  is  immediately  evident  that  the  two  kinds  of  fossil  walrus  tusks,  in 
addition  to  being  distinct  morphologically  and  biogeographically ,  are  also 
distinct  paleoenviroamentally.  (Remember  that  walruses  are  marine,  not 
terrestrial  animals.)  The  kind  that  is  identical  to  the  living  cold-water 
walrus  occurs  predominantly  in  regions  which  were  sea  during  Pleistocene 
time  (but  land  during  Late  Miocene  time);  the  other  kind  occurs  in  regions 
which  were  sea  in  Late  Miocene  time  (but  land  during  Pleistocene  time), as 
shown  in  Fig.  8-7. 

Thus,  we  resolve  the  apparent  dilemma  by  conclusions,  based  upon  our 
maps,  that  the  more  northern  group  of  tusks  are  Pleistocene  representatives 
of  the  living  cold-water  walrus  species  which  ranged  south  only  to  North 
Carolina  during  the  Ice  Age,  while  the  more  southern  tusks  represent  an 
earlier  (late  Miocene),  now-extinct,  warmer-water  walrus  species  which 
l »*;;ge d  as  far  south  as  Florida. 
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-  It  would  be  difficult 
indeed  to  provide  a  mecu.ingful  short 
summary  of  the  content  of  each  of  the 
preceeding  eight  sections,  and  that 
has  not  been  attempted  here.  This 
section  presents  a  general  summary 
as  a  basis  for  droving  conclusions 
about  the  MOD  effort  and  making 
specific  reaorrmendation3 . 


Rene  J.  Dub os  has  pointed  out  (In  his  forward 
to  "Attenuated  Infection",  1960)  that  a  very 
large  amount  of  relevant  Information  is 
available  in  the  published  literature  which 
has  remained  virtually  unnoticed  because  it 
has  not  been  integrated  in  a  meaningful 
pattern  and  correlated  with  the  natural 
w vents  of  disease. 
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9.1  GENERAL  SUMMARY 

The  MOD  project  represents  the  first  serious  effort  (to  ou’"  knowledge) 
to  develop  a  computerized  system  for  mapping  disease,  coupled  with  a  com¬ 
prehensive  data  file  of  ecologic  factors.  Our  goal  was  to  provide  a  system 
whereby  disease  and  environmental  data  could  be  manipulated  together  in  an 
appropriate  (geographic)  location  and  time  context,  with  direct  computer 
(/line-printer  or  /plotter)  output  in  the  form  of  distribution  maps,  or 
block  diagrams  —  with  supplemental  narrative  reports  as  required. 

The  implemented  system  would  have  a  capacity  for  producing  quickly, 
and  easily,  up-to-the  moment  maps  that  show  the  distribution  patterns  of 
diseases  and  causally  related  factors.  In  addition,  the  system  would  be 
an  important  research  tool  for  tnuse  persons  searching  for  new  causal  re¬ 
lationships,  and/or  attempting  to  predict  changes  in  location  patterns  or 
incidence  of  disease.  Obviously,  an  effective  research  method  for  linking 
contributing  and  precipitating  factors  with  a  given  disease  would  aid  in 
many  ways  our  understanding  of  the  etiology  of  disease.  If  the  cause  of 
the  disease  were  unknown,  it  would  be  a  means  to  define  the  communities 
with  different  incidences,  and  to  analyze  the  differences  between  these 
communities  as  to  environmental  and  other  factors.  In  this  way,  factors  of 
high  correlation  could  be  found,  giving  clues  as  to  etiology  and  pointing 
to  specific  basic  research  which  would  be  likely  to  define  etiology  and/or 
disclose  methods  of  control. 

In  connection  with  the  uses  of  the  MOD  system  whi'h  we  have  envisioned, 
two  excerpts  from  Professor  A.  Payne's  "Statement  on  Epidemiology",  made  to 
the  Executive  Board  of  WHO  (20  January  1966)  are  pertinent: 

In  the  laet  analysis  it  is  the  ecology  'f  an  area  which 
determines  what  diseases  might  become  serious  problems  as 
conditions  are  changed  in  the  process  of  development,  or 
should  any  of  a  Variety  of  agents  be  introduced.  Know¬ 
ledge  of  it  therefore  has  a  predictive  value  enabling 
one  to  foresee  future  angers  so  that  pr  'entire  action 
can  be  taken  in  good  time. 


9.  General  ourmary 


The  second  area  of  research  which  I  would  mention, 
involves  the  long- tern  development  of  ecological 
maps  of  the.  worVi,  including  the  distribution  of 
infectious  agents,  vectors ,  reservoirs  and 
ecological  conditions.  I  would  emphasize  that 
this  is  a  long- tern  objective,  but  one  which 
would  lead  to  major  advances  in  predictive 
epidemiology  and  communicable  disease  surveillance. 

The  immediate  result  of  the  MOD  effort  was  envisioned  as  an  operational 
computer  system  consisting  of  two  major  components: 

•  An  information  storage  and  retrieval  system 
specifically  designed  for  disease-environmental 
data 

•  A  graphic  output  system  that  would  manipulate 
retrieved  data  and  present  them  in  the  form 
of  maps  (principally),  block  diagrams,  graphs, 
and  narrative  reports. 


An  additional  important  result  was  to  be  the  description  of  methods 
and  techniques  necessary  to  select,  extract,  evaluate,  and  preprocess  "raw" 
narrative,  tabular,  and  graphic  data  so  that  they  could  serve  as  effective 
input  to  the  storage  and  retrieval  system. 


One  of  the  last  items  that  was  to  be  produced  for  the  MOD  system  was 
a  user's  manual.  This  was  considered  necessary  because  the  MOD  system  will 
provide  a  unique  capability,  one  with  which  the  potential  user  will  have 
had  no  experience. 


Financial  support  was  anticipated  for  a  period  of  three  years;  it  was 
provided  for  only  two  years  and,  as  a  result,  the  MOD  system  was  not  carried 
to  the  point  of  implementation.  However,  the  system  analysis  and  design 
have  both  been  completed  (with  the  exception  of  several  aspects  of  system 
design  that  need  further  elaboration,  but  require  that  this  be  performed 
in  the  context  of  a  partially  implemented  system).  Furthermore,  data 
characteristics  have  been  extensively  analyzed  as  to  sources,  limitations 
of  the  data,  per  se,  and  problems  involved  in  preparing  these  data  for 
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computer  input.  A  method  for  structuring  data  has  been  designed  and  tested, 
and  a  comprehensive  factor  catalogue  has  been  produced.  In  addition,  through 
our  analysis  of  maps  and  cartographic  techniques,  we  have  gained  new  insight 
into  the  characteristics  of  disease-environmental  data  that  allow  them  to  be 
mapped,  and  have  developed  data  extraction  forms  reflecting  these  require- 
men  ts . 

*  *  * 

Work  on  the  MOD  project  has  progressed  to  the  point  that  feasibility 
is  no  longer  a  question.  We  have  produced  many  disease-envircnirental  maps 
as  direct  computer/p] otter  output,  proving  the  vilidit>  of  our  hypotheses 
and  demonstrating  the  adequacy  of  our  data  and  our  methods  of  data  manipu¬ 
lation.  we  oexieve  tim.  have  developed  the  MOD  system,  not  to  completion, 
not  to  full  satisfaction,  not  to  implementation,  but  with  evory  expectation 
ot  success. 

9.2  CONCLUSIONS  AND  RECOMMENDATIONS 

The  conclusions  .ad  recommendations  contained  in  this  section  reflect 
nearly  three  years  of  effort  on  the  MOD  project.  To  state  these  simply; 

(1)  The  (MOD)  system  for  computerized  mapping  of  disease-environmental  data 
described  herein  is  feasible;  (2)  This  system  would  satisfy  an  important 
need  for  processing  data  to  provide  geographically  oriented  disease-environ¬ 
mental  information;  and  (3)  The  MOD  system  should  be  implemented,  and  can 
be  implemented  --  given  adequate  time,  effort,  and  financial  support. 

In  particular,  data-processing  aspects  should  present  no  significant 
technical  proul^ms  now  that  we  have  developed  an  effective  method  for 
structuring  the  data.  Readily  available  computer  science  concepts,  tech¬ 
niques,  and  equipment  are  adequate  for  this  task,  and  no  special  difficulties 
are  anticipated  in  producing  the  necessary  programs.  However,  data-collect- 
ing  aspects,  especially  extraction,  will  require  a  great  effort  (see  Fig. 

9-1,  next  page),  and  it  is  probably  this  phase,  moie  than  any  other,  that 
will  limit  use  of  the  system. 
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TOTAL  MOD  EFFORT 


Hany  of  the  methods,  techniques,  and  procedures  that  we  have  described 
can  be  implemented  independent 1  y  (to  a  limited  extent),  by  Imposing  sultaMe 
restrictions  on  appropriate  parts  ot  the  proposed  system.  Ideally,  the 
entire  system  should  be  implemented.  If  (funding)  priorities  do  not  allow 
this,  we  recoomend  that  the  system  be  implemented  in  part  —  to  whatever 
degree  is  permitted  by  available  resources. 

I  "'o  lenten  tar  ion  will  require  two  or  three  competent  computer  program¬ 
me  r« ,  working  full-time  for  about  one  to  one-and-a-half  years  under  the 
direction  of  a  computer-system  analyst,  in  turn,  supervised  by  a  professional 
biomedical  staff  of  two  or  three  persons  (.who  could  also  contribute  to  the 
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data  collection/preprocessing  effort).  At  le^ct  two  or  three  biomedical 
professional,  several  semi-professional,  and  several  clerical  personnel 
will  be  required  for  the  data-colKcting-prepro-  essing  efforts.  Once  the 
system  becomes  implemented,  the  number  of  data-collecting-preprocessing 
personnel  required  will  depend  entirely  upon  user  requirements  --  more 
specifically,  the  character  and  extent  of  the  data  base  rile  required  for 
effective  response  to  the  queries.  A  suggested  table  of  staff  organization 
Is  shown  in  Figure  9-2. 


*  *  ★ 


Requirements  of 
been  proved;  design  ha 
system  wouid  provide  a 
and  strongly  recommend 


the  MOL'  system  nave  been  specified;  feasibility  has 
s  been  accomplished.  Implementation  of  tne  MOD 
,  pwerful  new  tool  to  biomedical  ^  ience.  We  conclud 
that  such  a  system  be  implemented  as  soon  as  doss ib 


e 


le. 


CHIEF  OF 
COMPUTERIZED 
MAPPING  OF 
DISEASE  PROJECT 


su  s 


References  cited 


The  list  of  references  is  divided 
into  two  parts:  the  first ,  References 
Cite  a  —  the  second ,  Selected 
Bibliography . 

The  items  which  appear  among 
References  Cited  are  not  listed  a 
second  time. 


MAPPING  OF  DISEASE 


References  cited 


American  Geographical  Society,  World  distribution  of  spirochetal  diseases, 

3 . .  Leptospiroses :  Atlas  of  Diseases,  American  Geographical  Society, 

New  York,  pi.  17,  1955 

Bick,  K.F.,  and  Johnson,  G.H.,  Laboratory  Manual  in  Earth  Science,  Crowell, 

New  York  1967,  163  pp. 

Bunge,  W.  ,  Theoretical  Geography;  bieerup,  Lund,  Sw’eden,  1962,  210  pp. 

Burkitt,  D. ,  A  tumor  syndrome  affecting  children  in  tropical  Africa, 

Postgrad.  Med.  J,,  v,  38  :  71-79,  1962 

Datamation,  Automatic  Data  Processing  Glossary,  Thompson,  Greenwich,  Conn., 
1966,  62  pp. 

De  Paola,  D.,  Pathol  ?y  in  Brazil  —  past  and  present.  International 
Pathology,  v.  8  :  8-12,  1967 

Digital  Plotting  Newsletter,  November-December ,  Plotter  perspective  on  fJ.S. 
population ,  California  Computer  Products  Inc.,  Anaheim,  Calif.,  p.  I, 

1966 

Dorn,  H.F.,  A  classif ication  system  for  morbidity  concepts.  Public  Health 
Reports,  v.  72  :  1043-1048,  1957 

Espenshade,  E.G.,  Jr.,  editor,  Goode's  World  Atlas,  Rand  McNally,  Chicago, 
1964,  288  pp. 

Fisher,  H.T.  ,  e_t  al_  Introduction  to  synagrapnic  computer  mapping  — 

Computer  mapping  of  quantitative  and  qualitative  information 
Introductory  Correspondence  Course,  Laboratory  for  Computer  Graphics, 
Harvard  University,  Cambridge,  Mass.,  i9fc7 

Harbaugh,  J.W.,  A  computer  method  for  four-variable  trend  analysis 
illustrated  by  a  study  of  oil-gravity  variations  in  southeastern 
Kansas ,  State  Geological  Survey  of  Kansas,  Bull.  171,  1964,  58  pp. 

Hcnschen,  F.,  The  History  and  Geography  of  Diseases,  Delacorte  Press,  New 
York,  1966,  344  pp.  (English  transl. ,  Tate,  J.,  1966  —  orig. 
ed  1962) 

Howe,  G.M.,  National  Atlas  of  Disease  Mortality  in  the  United  Kingdom,  Royal 
Geographical  Society,  Thomas  Nelson  &  Sons,  London,  1963,  111  pp. 

Kratchman,  J.,  and  Grahn,  D  ,  Relationships  between  the  geologic  environment 
and  moLtality  from  congenital  malformation,  TTD-8204,  Biology  &  Medicine, 
Technical  Information  Service,  U.S.  atomic  Enerrv  Commission,  Washington, 
D.C.  ,  1959,  20  pp. 


References  Cited 


Learmonth.  A.T.A.,  Health  in  the  Indian  subcontinent  1955-1964,  Australian 
National  University,  Department  of  Geography,  Occasional  Paper  2, 

1965,  80  pp. 

_ ,  &  Nichols,  G.C.,  Maps  of  some  standardised  mortality  ratios 

for  Australia  1959-1963:  Australian  National  University,  Department  of 
Geography,  Occasional  Paper  3,  1965,  35  pp. 

Lobeck,  A.K.  ,  Block  diagrams  and  other  graphic  methods  used  in  geology  and 
geegyaphy ,  2nd  ed,  Lmerson-Trussell ,  Amherst,  Mass.,  1958 

Malek,  E.A. ,  The  ecology  of  schistosomiasis,  in  Studies  in  Disease  Ecology, 
May,J.M.,  editor,  Hafner,  New  York,  1961,  pp.  261-32? 

May,  J.M.,  editor,  Atlas  of  Diseases,  American  Geographical  Society,  New 
York,  1950-55 

Nature,  Maps  by  machine.  Nature,  v.  213  :  1166-1167,  25  March,  1967 

Nature,  Instant  maps.  Nature,  v.  214  :  230-231,  15  April,  1967 

Ohman,  H.L.,  Guidelines  for  constructing  isopleth  maps,  Spec.  Rept.  S-2, 

Earth  Sciences  Division,  U.S.  Army  Quartermaster  Research  & 

Engineering  Center,  Natick,  Mass.,  (Proj.  Ref.  1KO-2500I-A129) ,  1963, 

19  pp. 

O'Leary,  M. ,  Lippert,  R.H.,  and  Spitz,  O.T.  ,  FORTRAN  IV  and  MAP  program 

for  computation  and  plotting  of  trend  surfaces  for  degrees  1  through  6, 
State  Geological  Survey  of  Kansas,  Computer  Contribution  3,  1966,  48  pp. 

Osborn,  R.T. ,  An  automated  procedure  for  producing  contour  charts.  Informal 
Manuscript  IM  67-4,  U.S.  Naval  Oceanographic  Office,  Washington,  D.C. , 
1967,  48  pp . 

Petterssen,  S.,  Introduction  ro  Heterology,  2nd  ed.,  McGraw-Hill,  New  York, 
1958,  327  ppT  . . . 

PfaLtz,  J.L.,  and  Rosenf eld ,  A.,  Computer  representation  of  planar  regions 
by  their  skeletons,  Communications  of  the  Association  for  Computing 
Machinery,  v.  10  :  119-125,  196/ 

Preston,  F.W.,  and  Harbaugh,  J.W.,  BALGOL  programs  and  geologic  application 
for  single  and  double  Fourier  series  using  IBM  7090/7094  computers , 

State  Geological  Survey  of  Kansas,  Spec.  Distrib.  Pub.  24,  1965,  72  pp. 

Ray,  C.E.,  personal  communication,  196/ 

Richardson,  M . A .  and  Rollett,  J.S.,  The  Oxford  Cartogra,  Data  Bank  —  A 

feasibility  study  ot  accuracies,  store  sizes  and  operation  time  presented 
at  the  Third  International  Conference  on  Cartography,  Amsterdam  17-22 
April  1967 


K  -  j 


MAPPING  OF  DISEASE 


Robinson ,  A.H.,  £Igments_of  Cartography,  2nd  ed.,  John  Wiley, 

I960,  3 43  pp . 


New  York, 


Rodenwaldt,  E.,  editor,  World-Atlas  of  Epidemic  Diseases.  Falk-Verlag, 
Hamburg,  Germany,  3  vols.  ,  1932-58  . . 

Sippl,  C.J.,  Computer  Dictionary  and  Handbook.  Howard  W.  Sams, 
Indianapolis,  1966,  766  pp, 

ToEler ,  W.R.,  Note^^o^  the  analysis  of  geographical  distributions , 
University  of  Michigan,  Department  of  Geography,  Michigan  Inter- 
University  Coimoinity  of  Mathematical  Geographers  Discussion  Paper  8 
part  2,  1966,  14  pp. 

_ _ _ »  personal  communication ,  1967 


R  -  4 


Selected  bibliography 


The  L'e lected  Bibliography  is  not  meant 
to  be  exhaustive ,  but  to  serve  as  a 
guide . 


The  items  which  were  listed  among 
References  Ci ted  are  not  included  here. 


B  -  1 


HAPPING  OF  DISEASE 


Selected  bibliography 


In  addition  to  the  "References  Cit.d,"  we  list  here  a  group  of  selected 
references  dealing  with  data  processing,  information  storage  and  retrieval, 
epidemiology,  medical  ecology,  and  cartography  that  have  helped  to  orient  MOD 
project  personnel.  We  believe  that  they  may  be  of  value  to  others  concerned 
with  the  computerized  mapping  of  disease  and  environmental  data. 


Abramson,  N.,  Information  Theory  and  Coding,  McGraw-Hill,  New  York, 
1963,  201  pp. 

Abzug,  I.,  Graphic  data  processing.  Datamation,  v.  11,  no.  1  :  35-37, 
1965 

Adams,  G.P.,  The  use  of  a  computer  to  calculate  isodose  information 
surrounding  distributed  gynaecological  radium  sources,  Phys.  Med. 
Biol.,  v.  9  :  533-540,  1964 

Adams,  S.,  and  Taiue,  S.,  Searching  the  medical  literature,  J  Amer. 
Med.  Assoc.,  v.  188  :  251-  ’  '4,  1964 

Affel,  H.A.,  System  engineering,  PR  7571,  Auerbach  Corp.,  (reprint 
from  Ir.ternatl.  Science  &  Tech.),  1964,  9  pp. 

Ainsworth,  G.C.,  Storage  and  retrieval  of  biological  information. 
Nature,  v.  191  :  12-14,  1961 

American  Association  for  the  Advancement  of  Science,  Symposium 
on  information  retrieval.  Annual  meeting  of  A.A.A.S., 

Washington,  D.C.,  1966 

.American  Federation  of  Information  Processing  Societies,  1967 
Spring  Joint  Computer  Conference  Proceedings,  v.  30,  1967 

American  Geographical  Society,  Research  catalogue  of  the  American 
Geographical  Society,  Map  supplement.  Hall,  Boston,  1962 

American  Aheumat ism  Association,  Index  of  Rheumatology .  v.  1, 
no.  23,  1965 


B  -  2 


Bibliography 


Andrews,  D.,  and  Newman,  S.,  Storage  and  retrieval  of  contents  of  technical 
literature  —  Nonchemical  information.  Res.  &  Dev.  Rept.  1,  U.S. 

Patent  Office,  13  pp,  1956 

Andrews,  R.D.,  and  Ferris,  D.H.,  Relationships  between  movement  patterns 
of  wild  animals  and  the  distribution  of  leptospirosis.  Wildlife 
Management,  v.  30  :  131-134,  1966 

Anstey,  R.L.,  A  system  for  collation  of  environmental  data.  Res.  Study 
Rept.  RER-27,  Regional  Environments  Research  Branch,  U.S.  Army 
Quartermaster  Research  &  Engineering  Center,  Natick,  Mass.,  1959, 
approx.  30  pp. 

_ ,  Digitized  environmental  data  processing,  rev.  ed., 

Res.  Study  Rept.  RER-31,  Regional  Environments  Research  Branch, 

U.S. Army  QaarLcraaster  Research  &  Engineering  Center,  Natick, 

Mass.,  1963,  approx.  25  pp. 

Arbib,  M.A. ,  Brains.  Machines,  and  Mathematics.  McGraw-Hill,  New  York, 

1964,  152  PP. 

Armstrong,  R.W.,  Computer  graphics  in  medical  geography.  Proc.  Internatl. 
Geogr.  Union,  Latin  Am.  Regional  Conf . ,  v.  6  :  69-74,  1966 

Artandi,  S.,  Investigation  of  systems  for  the  intellectual  organization 
of  information.  Grant  NSF  GN99,  Rutgers  Univ.,  New  Brunswick,  N.J., 
x>o4,  44  pp. 

Asero,  J.J.,  Find  that  fact.  Array  Information  Digest,  v.  21,  no.  3  : 

45-49,  1966 

Atkins,  H.,  editor,  Proceedings  of  a  One-Day  Symposium  on  Progress  in 
Medical  Computing.  Elliot*.  Medical  Automation  Ltd.,  London,  1965 

Auerbach  Corp. ,  Cancer  Chemotherapy  Abstracts.  National  Cancer 
Institute,  Bethesda,  Md.,  v.  6,  no.  7-8,  1965 

_ ,  A  description  of  Auerbach  Information  Management  System. 

Auerbach  Corp.,  Philadelphia,  1966,  approx.  50  pp. 

Auerbach,  I.L.,  The  Impact  of  Information  processing  on  mankind.  PR  7603, 
Auerbach  Corp.,  (reprint  r rom  Proc.  of  I.F.I.P.  Congress 
1962) ,  5  pp . 

_ _ _ ,  Employment  implications  of  automation,  PR  7558, 

Auerbach  Corp.  (Senate  Subcommittee  on  Employment  A  Manpower), 

1963,  22  pp. 

_ _ _ ,  The  role  of  the  systems  designer/consultant .  PR  7654, 

Auerbach  Corp  (U.S.  Army  Automatic  Data  Processing  Seminar  for 
General  St"ff  Officers),  1964,  10  pp. 

_ ,  Tomorrov'a  outlook  for  LDP,  PR  7623,  Auerbach  Corp., 

(reprint  from  J.  of  Data  Management),  1964,  6  pp. 


B  -  J 


MAPPING  OF  DISEASE 


Aumen,  W.C.,  A  new  map  —  The  numerical  (digital)  map  (abstr.), 

ACSM-Abi  1966  Convention  Prog.,  p.  1,  1966 

Austin,  C.J.,  Data  processing  aspects  of  MEDLARS.  Bull.  Med.  Libr. 

Assn.,  v.  52  :  159-163,  1964 

_ ,  The  MEDLARS  System  --  An  application  report.  Datamation, 

v.  10,  no.  12  :  2b-31,  1964 

Baca ,  A . ,  Prediction  of  the  performance  of  a  solution  gas  drive  reservoir 
by  Mi  skat 1 s  Equation.  Kan.  Geol.  Surv.,  Comp.  Contr.  8,  1967,  35  pp. 

Bahn,  R.C. ,  et^  al,  An  information-retrieval  system  for  research  associated 
with  the  postmortem  examination,  Mayo  Clin.  Proc.,  v.  39  : 

835-840,  1964 

_ ,  Potential  uses  of  a  digital  computer  in  the  Section  of 

Experi mental  4  Anatomic  Pathology,  Mayo  Clin.  Proc.,  v  39  : 

830-834,  1964 

Baker,  J.J.,  Scanning  text  with  a  1^01,  Communications  of  the  ACM,  1964 

Bake r ,  M . L . ,  Automatic  map  compilation  equipment  for  altitude  measurements 
and  orthophoto  productions  (abstr.),  ACSM-ASP  1966  Convention  Prog., 
p.  6,  1966 

Barrer,  L.A.,  Machine  inventory  of  human  pathological  specimens,  Proc.  4th 
IBM  Med.  Symp.,  pp.  105-108,  1%2 

Bartcher,  R.L.,  FORTRAN  IV  program  for  estimation  of  cladistic  relation¬ 
ships  using  the  IBM  7040,  Kan.  Geol.  Surv.,  Comp.  Contr.  6,  1966,  54  pp 

Bartholomay,  A  F.,  The  mathematical  approach  to  the  study  of  discrete 
biological  events.  Proc.  6th  IBM  Med.  Symp.,  pp .  325-349,  1%4 

Baruch,  J.J  and  Barnett,  0.0. ,  Joint  venture  at  Massachusetts  General, 
Datamation,  v .  11,  no.  12  :  29-33,  1965 

Bas i le ,  A . S . ,  The  key  to  automated  cartography  -  -  A  precision  digital 
plotter  system,  Amur.  Cong,  on  Surveying  &  Mapping,  1964  Regional 
Convention,  Kansas  City,  Mo.,  1^64 

Bassett,  F.J.,  Machine  information  retrieval  —  An  annotated  introduction 
and  protection.  Bull.  Med.  Libr.  Assoc.,  v.  51  :  p.  221-225,  1963 

Hattelle  Memorial  Institute,  Directory  of  selected  specialized  Intormatlon 
services  Ad-Hoc  Forum  of  Scientific  and  Technical  Information 
Analysis  Center  Managers,  Directors,  and  Professional  Analysts,  held 
at  Battelle  Memorial  Institute,  Columbus,  Ohio,  124  pp.,  1965 

Baum,  C.,  and  Go r such,  L.,  editors.  Proceedings  of  the  Second  Symposium 
on  Computer-Centered  Data  Base  Systems,  Tech.  Mem.  TM-2&24/ lOU/uO , 
System  Development  Corp.,  Santa  Monica,  Calif.,  1965,  approx.  200  pp. 


Bibliogravny 


Becker,  J. ,  and  Havers,  R.M.,  Information  storage  and  retrieval  —  Tools, 
elements,  theories.  John  Wiley,  N.Y.,  1963  ,  448  pp, 

Beckwith,  H.M, ,  Automatic  map  compilation  system.  Contract  DA  44-009-eng- 
4596,  FR,  phase  1;  RW-C129-2U2,  Thompson  Ramo  Wooldridge  Inc., 

Canoga  Park,  Calif  ,  1962,  83  pp. 

Behrens,  C. ,  Computers  and  security.  Science  News,  v.  91  :  532-533,  1967 

_ ,  Computers  that  hear,  Science  News,  v.  91  :  214,  1967 

Bell,  D.A. v  Information  Theory  and  Its  Engineering  Applications,  3rd  ed., 
Pitman,  N.Y.,  1962,  196  pp . 

Bell  Telephone  Laboratories,  Electronic  graphics  by  computer,  Ccience, 
v.  156  :  8,  1967  ~  '  . . 

Benson-Lehner  Corp.,  Computer  Graphics  (various  issues),  1965-67 

_ ,  LTE  &  STE  FORTRAN  IV  PLOT  subroutines.  Publ.  548, 

Benson-Lehner  Corp.,  Van  Nuys,  Calif.,  1966,  55  pp. 

_ ,  LTE  &  STE  plotting  systems,  Publ.  551  DAB,  Ber.  on-Lehner 

Corp.,  Van  Nuys,  Calif.,  isb6,  48  pp. 

Berman,  M. ,  Incomplete  data  and  models,  Proc.  6th  IBM  Med.  Symp . , 
pp.  647-o53,  1964 

Bernard,  J.,  and  Schilling,  C.W. ,  Accuracy  of  titles  in  describing  content 
of  biological  sciences  articles,  Biological  Sciences  Communication 
Project,  Aroer.  Inst,  of  Biol.  Sci. ,  Washington,  D.C.,  1963,  90  pp. 

Bernstein,  R.A.,  How  Computers  Work  —  Operation  Update,  Workbook  No.  1, 
Factory,  Reader  Service  Dept.,  N.Y.,  1962,  10  pp . 

b,  ctram,  S-,  and  Beckwith,  H.M.,  Type  II  Interim  technical  report  for 
the  automatic  map  compilation  system.  Rept.  C129-109,  Contract 
DA  44-009-eng-4596 ,  Thompson  P.amo  Wooldridge  Inc.,  Canoga  Park, 

Calif.,  1961,  43  pp. 

Berul,  L, ,  Informarlon  storage  and  retrieval  --  A  scate-i  £  - the-art  report. 
PR  "’500-145,  Auerbach  Corp.,  Philadelphia,  19b4,  approx.  100  pp . 

_ ,  Methodology  and  results  of  the  OP  User  Needs  Survey, 

PR  7500-130,  Auerbach  Corp.,  Philadelphia,  1965,  24  pp. 

Slakes  ley,  R.G.,  The  planping  databank  challenges  the  surveyor  (abstr.), 
ACSM-ASP  1966  Convention  Prog.,  p.  6,  19*6 

Bluaenstock,  D .  I .  ,  The  reliability  factor  in  the  drawing  of  isarlthms. 
Annals  'f  assoc .  of  Aaer.  Geographers ,  v.  4j,  1453 

Bobrov,  D.C. ,  tt_  The  BBN-LISP  System.  Sci,  Rept.  1,  Contract  AF 
19(6285-5065,  Bolt  Beranek  and  Newman  Inc.,  Cambridge,  Mass., 

1966,  82  pp. 


MAPtxNJ  OF  .  I'JEASb 


Boggess,  W.R.,  aid  Russell,  R.L.,  Stream  flow  patterns  on  Lhe  Lake 

Glendale  watershed  in  southern  Illinois,  Univ  of  Illinois.  Dept, 
of  Forestry,  Forestry  Note  110,  1964,  5  pp. 

Bolton,  R.M. ,  lhe  potential  of  scanners  in  cartography  (abstr.), 

ACSM-ASP  1966  Convention  Prog.,  p.  7,  1966 

Bonato,  R.R.,  A  general  cross-classification  program  for  digital  computers, 
Uhav .  Sci.  ,  v.  6  :  347-357,  1961  ~  '  ' ~ . .  “ . “ 

Bonner,  R.E.  ,  e_t  ad,  DAP  —  A  diagnostic  assistance  program,  Proc.  6th 
IBM  Med.  Symp.,  pp.  81-108,  1964  .  . . . 

Borko,  H. ,  and  Bernick,  M.D. ,  Toward  the  establishment  of  a  computer- 
based  classification  system  for  scientific  documentation,  Tech. 

Memo.  TM-1763,  System  Development  Corp.,  Santa  Monica,  Calif., 

1964,  49  pp . 

Bosch,  R. ,  Account  numbering  and  identification  systems,  PR  7686,  Auerbach 
Corp.,  (Am.  Bankers  Assoc.  Savings  Bank  Workshop),  1964,  12  pp. 

Bousky,  S.,  Scanning  techniques  for  light  modulation  recording.  Tech. 

Rept.  AFAL-TR-66-188,  Contract  AF  33 (6151 -2632  ,  Ampex  Corp., 

Redwood  City,  Calif  ,  1966,  141  pp. 

Brain,  A.E.,  et  al ,  Graphical  data  processing  research  study  and  experimental 
Investigation,  Quart.  Progr.  Reptc. .  (1,2,3,4,5,6,7,3,15),  Contract 
DA-36-039-SC-73343 ,  Stanford  Research  Institute,  Menlo  Park,  Calif., 
1960-64 

Breeding,  K.J.,  et_  ad,  Order  code  for  the  film  scanners  of  ILL,  I  AC  III, 

Rept.  1 7 1) ,  Dept,  of  Computer  Science,  Univ.  of  Illinois,  urbana, 

1963,  20  pp. 

Breimann,  R.J.,  Harvard  picks  Alexandria  for  computer  map  making.  Evening 
Star.  Washington,  D.C.,  1  Mar.  1966  issue 

Briggs.  E.I  .  and  Pollack,  H.N..  Digital  model  of  evqporlte  sedimentation, 
Science,  v.  155  :  453-456,  1967 

Brown,  J..  and  Wagner,  D.  ,  Subsystem  for  the  digital  lou.'ng  and  remote 

display  of  curved  lines.  Kept.  IST-d^OO-l 19- f ,  Contract  DA  36-039-sc- 
78801,  Inst.  Sci.  6  Tech.,  Univ.  of  Michigan,  I960,  3L  pp. 

Brunelie,  R.i!.,  Systems  programs  to  accomodate  biomedical  research, 
pp.  i27-14u  of  Stacy  &  Waxsnan ,  Computers  in  Biomedical  Research, 

Academic  Press,  New  York,  1965 

Buck ,  C .  P .  ,  et_  al ,  Invest  igat  ion  and  study  of  graphic-semantic  composing 
techniques ,  Rept.  Eng.  n95-fcl5F,  Contract  AF  30(.b02  )209I ,  Research 
Inst.,  Syracuse  Univ.,  Syracuse,  N.Y.,  Ivbl 

Burck,  G. ,  The  boundless  age  of  the  computer,  in  six  parts.  Fortune: 

March,  101  - ;  April,  141- ;  May  133- ;  dun-.,  ilj-;  August ,  123- ; 

October,  Oh  ■,  L:-?o4 


B  -  6 


bibliography 


Burkitt,  D.  ,  A  great  pathological  frontier.  Postgrad.  Med.  J.,  v.  4 ?.  ; 
543-547,  1966 

_ ,  and  Hutt,  M.S.R.,  .An  approach  to  geographic  pathology 

in  developing  countries.  International  Pathology,  v.  7,  no.  1  : 

106,  1966 

_ ,  and  Wright,  D.,  Geographical  and  tribal  distribution  of 

the  African  lymphoma  in  Uganda.  British  Med.  J.,  v.  1  :  569-573,  1966 

Burzynski,  E.F.,  UNAMACE  --  Universal  Automatic  Map  Compilation  Equipment 
(abatr.),  ACSM-ASP  1966  Convention  Prog.,  p.  7,  1966 

C ahr  J . N . ,  Closing  gaps  in  biological  conmm.lcations  —  Need  for  a 
national  voluntary'  plan  for  science  information  in  the  70’s.  Fed. 

Proc.,  v.  22  :  993-1001,  1963 

Cain,  S.A.,  A  definition  of  hunan  ecology,  Paper  presented  at  Symp.  on 
Human  Ecology,  1966  Annual  Meeting  of  American  Association  for 
Advancement  of  Science,  Washingt''  D.C.,  1966 

California  Computer  Products,  Enroot  plotting.  Bull.  123B,  Calif.  Comp. 
Prod.  Inc.,  Anaheim,  Calif.,  1964 

_ _ ,  Digital  Plotting  Newsletter,  various  Issues,  1964-1967 

_ .  Digital  plotting  systems,  Bull.  1/5C,  Calif.  Comp.  Prod. 

Inc.,  Anaheim,  Calif.,  19*'. 6 ,  16  pp. 

Candarjs,  G,,  et  al_ ,  Presentation  d'un  code  pour  fiches  perforees  A  tri 
electronigue ,  Oncologia  (Basel),  v.  16  :  210-220,  1963 

Cannon,  H.L.,  Geochemical  relations  of  zinc-bear' ng  peat  to  the  Lockport 
Dolomite,  Orleans  County,  New  York,  I'.S.  Geol.  Surv.,  Bull.  ’vH-D, 
pp.  2 19-185,  1953 

_ ,  The  development  of  botanical  methods  of  prospect .ng  for 

uranium  on  the  Colorado  Plateau,  U.S  Geoi .  Surv . ,  Bull.  1085-A, 
pp.  1-50,  i960 

_ The  blc geochemistry  of  vanadium.  Soil  Science,  v.  96  : 

196-204 ,  1963 

_ _ and  Bowies,  J.K.,  Cent  amir  at  ion  of  vegetation  by  tetraethyl 

lead,  Science,  v.  137  ;  765-766,  1962 

C a  r  I  s  o n ,  W . M . ,  A  management  Information  system  designed  by  manage rs , 
Datamation,  v.  13,  no.  5  :  37-43,  1967 

Carpenter,  H . M . ,  System  for  storage  and  retrieval  of  data  from  autopsies  , 
Aroer.  J-  Clin.  Path.,  v.  3.8  :  449-467  ,  1962 

_ ,  Data  processing  systems  in  pathology,  Bioroed.  Sc i . 

Instrum.,  v.  ]  :  25-31,  1963 


B 


MAPPING  OF  DISEASE 


Casey,  R.S.,  et_  a^L,  Punched  Cards  —  Their  Applications  to  Science  and 
Industry ,  2nd  ed.„  Reinhoid,  New  York,  1958,  697  pp. 

Caster,  W.O.,  Use  of  digital  cortputer  in  study  of  eating  habit  patterns, 
Amer.  J.  Clin.  Nutr, ,  v.  10  :  98-106,  1982 

Census  Bureau,  Map  area  computer,  U.S  Bureau  of  Census,  1964,  4  pp. 

'hamberlin,  W. ,  The  Round  larth  on  Flat  Paper,  National  Geographic 
Society,  Washington,  D.C.,  1947 

Chayes,  F. ,  and  Susukl,  Y. ,  Geological  contours  and  trend  surface,  J. 
Petrology,  v.  4  :  J0/-312,  1963 

Cheshier,  R.G.,  Machine  information  search  system.  Bull.  Med.  Libr. 

Assn.,  v.  50  :  481-486  1962 

Christopherson ,  W.M.,  and  Mendez,  W.M. ,  A  local  geographic  study  of 

cervical  cancer.  International  Pathology,  v.  7,  no.  4  :  103-105,  1966 

Chung,  C.S.,  Genetic  analysis  of  human  family  and  population  data  with 
use  of  digital  computers,  Proc.  3rd  IBM  Med.  Symp.,  pp.  51-78,  1961 

Clearinghouse  for  Federal  Scientific  &  Technical  Information  (formerly 
Office  of  Technical  Services),  Selective  Bibliographies.  C.F.S.T.I. 
(O.T.5.),  "nshington,  D.C.  ,  1959-66 

Coles,  M.W. ,  Applications  of  the  electronic  digital  computer  in  nautical 
cartography  (abstr.),  ACSM-ASP  1966  Convention  Prog.,  p.  13,  1966 

Colilla,  R.A. ,  and  Sams,  B.H.,  Information  structures  for  processing 
and  retrieving:  Communications  of  the  ACM,  1962  (?) 

College  of  American  Pathologists,  Systematized  Nomen.  Lature  of  Pathology 
(SNOF) ,  College  of  American  Pathologists,  Chicago,  1965,  439  pp. 

Collins,  G.,  Display  software  technology.  Signal,  July  1966  issue 

Coiner,  B.J.,  Line-simulated  map  (abstr.),  ACSM-ASP  1966  Convention 
Prog.,  p.  14,  1966 

Commission  Cooperative  Technical  Africa,  Symposium  on  the  survey  needs 
of  developing  countries,  Report  of  the  Acting  Secretary-General  to 
the  18th  session  of  the  Commission,  Section  9,  1963 

Compendum  Publications,  Cardiovascular  Compendium,  v.  i,  no.  5,  1966 

Connelly,  R.R.,  et_  al ,  End  results  in  cancer  of  the  lung  -  Comparison 
of  male  and  female  patients,  J.  Natl.  Cancer  Inst.,  v.  36  :  277-287, 
1.966  ~ 

Connor,  D.H.,  and  Lunn,  H.F.,  Buruli  ulceration:  Arch.  Path.  v.  81  : 
183-199,  1966  " 

Control  Data  Corporation,  Control  Data  3600  Computer  System  Reference 
Manual,  Publ.  213b,  Control  Data  Corp.,  Minneapolis,  1963, 
approx.  100  pp. 


B  -  8 


_ _ ,  Abstracts  of  available  civil  engineering  appllcatloos , 

Control  Data  Corp.,  Minneapolis,  1965,  6  pp, 

Cooley,  J.C.,  A  Primer  o £  Formal  Logic,.  Macmillan,  New  York,  1342,  378  pp, 

Coons,  S.A.,  Computer  graphics  and  innovative  engineering  design, 
Bpjamation,  v.  12,  no.  5  :  32  34,  1966 


_ _ The  uses  of  comput 

v.  215,  no.  3  :  176-188,  1966 


Scientific  American, 


Coppock,  J.T.,  Electronic  data  processing  in  geographical  research. 

The  Professional  Geographer,  v.  14,  no.  4  :  1-4,  1962 

Corbin,  H.S.,  A  survey  of  CRT  display  consoles.  Control  Engineering, 

Reuben  H.  Donnelly  Corp.,  New  York,  1965,  8  pp. 

Creighton,  R.,  The  Pacific  Project  Data  System  -  A  tool  for  the  utiliza¬ 
tion  of  bird  data.  Information  Systems  Division,  Smithsonian  Inst., 
Washington,  D.C.,  1966 

_ _ ,  and  Humphrey,  P.S.,  Application  of  Automatic  Data  Process¬ 
ing  to  the  study  of  seabirds.  Dept,  of  Vertebrate  Zoology,  Natural 
History  Museum,  Smithsonian  Inst.,  Washington,  D.C.,  1966 


Cude,  W.C., 
1962 


Surveying  &  Mapping,  v.  22  :  413-436, 


Cunningham,  B.T.,  Coding  of  pathologic  diagnoses  at  the  Armed  Forces 

Institute  of  Pathology.  Amer.  J.  Clin.  Path.,  v,  25  :  1181-1182,  1955 

Cutler,  S.J.,  Trends  In  cancer  cnerapy  and  patient  survival.  1940  to  1959, 
Natl.  Inst.  Health,  Natl.  Cancer  Inst.,  Sethesda,  Md.,  pp.  745-759, 
1965 

_ _ _ _,  and  Latourette,  H.B.,  A  national  cooperative  program  for 

the  evaluation  of  end  results  i  cancer.  J.  Nat  Cancer  Inst.,  v»  22  : 
633-646,  1959 

Datamation,  Data  transmission  systems.  Datamation,  v.  11,  no.  12  :  51-53, 
1965 

Davi s ,  J . C . ,  Application  of  response-surface  analysis  to  sedimentary 
petrology.  Kan.  Geol.  Surv.,  Computer  Contrib.  12,  pp.  57-62,  1967 

_ ,  and  Sampson,  R.J.,  FORTRAN  II  program  for  multivariate 

discriminant  analysis  using  an  IBM  1620  computer.  Kan.  Geol.  Surv., 
Comp.  Contr.  4,  1966,  8  pp. 


Dayhoff,  II. 0.,  A  contour-map  program  for  X- 
tions  of  the  ACM,  v.  6  :  620-622,  1963 


Communica- 


DeMeter,  E.R. ,  The  influence  of  automation  on  mapping  requirements  and 
techniques  (abstr.):  ACSM-ASP  1966  Convention  Prog.,  p.  11,  1966 


B  -  9 


V 


Bib liography 


I'M* 

L  .  •■?<- 


MAPPING  OF  DIPEAS. 


Dempsey,  J.R.,  A  generalized  two-dimensional  regression  procedure, 

Kan.  Geol,  Surv.,  Comp.  Contrib.  2,  1966,  12  pp. 

Department  of  Defense,  Disease  and  Injury  Codes,  U.E.  Army  Tech.  Bull., 

TB  MED  15,  1963,  approx.  650  pp. 

Department,  of  the  Array,  CartoRrap,.lc  Aerial  Photography.  Tech.  Man.  TM 
5-243,  U.S.  Govt.  Printing  Office,  Washington,  D.C.,  1964,  63  pp. 

Derrick,  E.H.,  et  al,  Epidemiological  observations  on  leptospirosis  in 
north  (X.eensland,  Australasian  Annals  of  Medicine,  v.  3,  no.  2  : 

85-97,  1954* 

Desautels,  A.V.,  Autoiaatic  point  marking  and  measuring  instrui  nt  test 
results  (abstr.),  ACSM-AS?  1966  Convention  Prog,  p.  12,  1966 

Dietrich,  E.V.,  Machine  retrieval  of  pharmacological  data.  Science, 
v,  132  :  p.  1556-1557,  1960 

Digital  Equipment  Corp. ,  Computers  in  oceanography.  Publ.  G-8260, 

Digital  Eqpt.  Corp.,  Maynard,  Mass.,  1965,  8  pp. 

_ _ ,  The  Digital  Logic  Handbook,  1966-67  edition;  Digital 

Eqpt.  Corp.,  Maynard,  Mass.,  1966,  330  pp. 

Dillon.  E.L.,  and  Nichols,  C.W.,  Handling  of  statistical  well  data  by 
computer,  Amer.  Assoc.  Petrol.  Geol.,  v.  49  :  1520-1531,  1965 

Dingman,  H.F.,  Computer  analysis  of  psychological  and  psychiatric  data, 
pp.  331-350  of  Stacy  &  Waxman,  Computers  in  Biomedical  Research,  1965 

Dixon,  P. ,  Decision  tables  and  their  application.  PR  7568,  Auerbach  Corp., 
(reprint  from  Computers  and  Automation,  v.  13,  no.  4),  1964,  8  pp. 

Dixon,  P.J.,  and  Sable,  J.,  DM-1  -  A  generalized  data  management  system. 
Spring  Joint  Computer  Conference,  AFIPS  Conf .  Proc. ,  v.  30  :  185-198, 
1967 

Documentation  Inc.,  Actual  and  potential  association  of  ideas  in 
Information  systems.  Tech.  Rept.  3,  Contract  Nonr-1305 (00) 
Documentation  Inc..  Washington,  D.C.,  1954,  6  pp. 

Dodd,  J.R.,  Cain,  J.A.,  and  Bugh,  J.E„,  Apparently  significant  contour 
patterns  demonstrated  with  random  data.  J.  Geol.  Ed.,  v.  13  : 

109-112,  1965 

Dorn,  H.F.,  and  Cutler,  S.J.,  Morbidity  from  cancer  in  the  United  States, 
Publ.  Hlth.  Mon.  56,  Natl.  Cancer  Inst.,  Bethesda,  Md,,  1959 

Dotson,  J.C.,  editor,  Short  course  on  computers  and  computer  applications 
in  the  mineral  industry.  College  of  Mines,  Univ.  of  Arizor.a,  1961 

_ ,  and  Peters,  W. ,  editors,  Computers  and  computer  applica¬ 
tions  in  mining  and  exploration.  Univ.  of  Arizona,  Tucson,  Ariz.,  1961 


B  -  10 


Bibliography 


Dreyfus,  R.H.,  et  al,  Tulane  Information  Processing  System.  Version  II 
Including  MEDITRAN.  Monogr.  3,  Computer  Science  Series,  Tulane 
Univ.,  1966,  45  pp. 

Drosness,  D.L.,  et_  al,  The  application  of  computer  graphics  to  patient 
origin  study  techniques.  Publ.  Health  Rept.,  v«  80  :  33-40,  1965 

Duncan,  O.D.,  Cuzzort,  R.P.,  and  Duncan,  B.,  Statistical  Geography  — 
Problems  in  Analyzing  Areal  Data.  Free  Press  of  Glencoe,  Glencoe, 

Ill.,  1961,  191  pp, 

Dunlap  and  Associates,  The  Department  of  the  Army  ENVANAL  System,  ilth 
Progr.  Rept.,  Contract  DA44-109-qm-1561,  Dunlap  &  Associates  Inc., 
Stamford,  Conn.,  1955,  25  pp. 

_ ,  ENVANAL  —  Field  test  of  system  on  Operation  MOOSEHORN, 

Progr.  Rept.  2-2,  Contract  No.  DA19-129-qm-39G ,  Dunlap  &  Associates 
Inc.,  Stamford,  Conn.,  1956,  163  pp. 

_ ,  Project  ENVANAL,  Final  Report  on  Research  Phase,  Contract 

DAlS-129-qm-390 ,  Dunlap  &  Associates  Inc.,  Stamford,  Conn.,  1956, 

110  pp. 

Eberhart,  J.,  About  the  systems  system.  Science  News,  v.  91  :  19,  1967 

Eden,  M. ,  Pattern  analysis,  Proc.  3rd  IBM  Med.  Symp.,  pp.  215-232,  1961 

Edraundson,  H.P.,  Automatic  abstracting;  Final  Rept.  C107-3U1  (RADC-TDR- 
63-93),  Contract  AF  30(602)2223,  TRW  Computers  Co.  Canoga  Park, 
Calif.,  1963,  91  pp. 

F.inhorn,  S.J.,  Reliability  prediction  for  repairable  redundant  systems. 
Proc.  of  the  IEEE,  v.  51  :  312-317,  1963 

Empey,  S.L.  ,  Computer  applications  in  medicine  and  the  hfr^gical  sciences 
bibliography,  Rept.  SP-1025,  System  Development  c  ip.,  Santa  Monica, 
Calif.,  1962,  38  pp. 

Eubanks,  F.R.,  and  Baker,  G.T.,  Array  Research,  Automated  Mapping  Systems. 
Snec,  Rept.  11,  Contract  AF  33 (657)-12747 ,  Texas  Instruments  Inc., 
Dallas,  Tex..  1966,  40  pp. 

Evans,  D.C.,  Computer  logic  and  memory:  Scientific  Americm,  v.  215, 
no.  3  :  74-85,  1966 

Fairchild  Camera  &  Instrument  Corp.,  Viewer,  still  picture.  Final  Develop¬ 
ment  Rept.  (SME-AG-3 ;  RADC  TR  58-160),  Contract  AF  30(602)1727, 
Fairchild  Cam.  6  Inst.  Corp.,  Syosset,  N.Y.,  1958,  18  pp. 

Fano,  R.M.,  and  Corbatd,  F.J.,  Time-sharing  on  computers.  Scientific 
American ,  v.  215,  no.  3  :  128-140,  1966 

Favret,  A.G. ,  Introduction  to  Digital  Computer  Applications,  Reinhold, 

New  York,  1965,  246  pp. 


B  -  11 


MAPPING  OF  DISEASE 


Feidelman,  L.,  A  survey  of  the  character  recognition  field,  PR  7593, 

Auerbach  Corp,,  1966,  25  pp. 

Fleischer,  M. ,  Fluoride  content  of  ground  water  in  the  conterminous 
United  States.  U.S.  Geol.  Surv.,  Hisc.  Geol.  Investig.,  Map  1-387, 

1962 

FMA  Inc.,  The  File-Search  System,  general  information  manual,  FMA  Inc., 
Washington,  D.C.,  1964,  28  pp. 

Fox,  W.T. ,  FORTRAN  IV  program  for  vector  trend  analyses  of  directional 
data.  Kan.  Geol.  Surv.,  Comp.  Contr.  11,  1967,  36  pp. 

Fuchs,  A.,  Geography  of  eye  disease.  Notring  der  Wissenschaf tlichen 
VerbMnde  Osterreichs,  Wien,  1962,  162  pp. 

Garfield,  E.,  Citation  indexes  for  science  —  A  new  dimension  in 

documentation  through  association  of  ideas.  Science,  v.  122  :  108-111, 
1955 

Garfinkel,  D. ,  Digital  computer  simulation  of  ecological  systems.  Nature, 
v.  194  :  856-857,  1962 

_  _ ,  Programmed  methods  for  printer  graphical  output. 

Conmunications  of  the  A.C.M. ,  v.  5  :  477-479,  1962 

_ ,  Similation  of  ecological  systems,  pp.  205-216  of  Stacy  and 

Waxman,  Computers  in  Biomedical  Research,  Academic  Press,  New  York,  1965 

_ _ ,  ejt  al,  Computer  simulation  anu  analysis  of  simple  ecological 

systems,  Ann.  N.Y.  Acad.  Sci. ,  v.  115  :  943-951,  1964 

Garrett,  P.,  Classification  system  for  any  data  banking  (information 

storage  and  retrieval)  process.  Res.  Rept.  59-6,  Contract  Nonr-2666 (00) , 
Benson-Lehner  Corp.,  Santa  Monica,  Calif.,  1959,  11  pp. 

Garvey,  W.D.,  and  Griffith,  B.C.,  Scientific  communication  as  a  social 
system.  Science,  v.  157  :  1011-1016,  1967 

Gau 1 ,  R . D . ,  Instrumentation  and  data  handling  system  for  environmental 
studies  off  Panama  City,  Fla.,  Ref.  no.  62-IT,  Contract  Nonr-211904, 
Texas  A.  &  M.  College,  College  Station,  Tex.,  1962,  6  pp. 

Gerard,  R.W. ,  Quantitation  In  biology,  Proc.  4th  IBM  Med.  Symp. , 
pp.  29-48,  1962 

Giaiumo,  T.P.,  A  mathematical  method  for  the  automatic  scaling  of  a 

function,  J.  Assn,  for  Computing  Machinery,  v.  11,  no.  1  :  79-83,  1964 

Gilbert,  E.N. ,  Information  theory  after  18  years.  Science,  v.  152  : 

320-325,  1966 

Gilles,  H.M.,  Akufo  —  An  Environmental  Study  of  a  Nigerian  Village 
Community .  Ibadan  Univ.  Press,  Ibadan,  Nigeria,  1964,  80  pp. 


B  -  12 


Bibliography 


Gittelsohn,  A.M. ,  ec_  al,  Tabulation  of  vital  records  by  computer.  Public 
Health  Rept.,  v.  79  :  895-904,  1964 

Gordon,  B.L.,  editor.  Current  Medical  Terminology  (CMT) ,  3rd  ed.,  Amer. 

Med.  Assoc.,  Chicago,  1966,  969  pp. 

Gooden,  J.A. ,  Estimating  computer  performance.  Computer  Journal,  v.  5, 
no.  4  :  276-283,  1964 

_ _ _ ,  and  Sisson,  R.L.,  Standardized  comparisons  of  computer 

perf orcance.  PR  0624,  Auerbach  Corp.,  (reprint  from  Proc.  of  IFIP 
Congress  1962,  pp.  57-61),  1962 

Gray ,  d . ,  et  al ,  Information  retrieval  and  the  design  of  more  intelligent 
machines.  Final  Rept.  for  1  May  58-30  Jun  59,  Project  ADAR,  Task  E, 
Contract  DA  36-039-sc-75047 ,  Moore  School  of  Elect.  Eng.,  Univ.  Penn., 
1959,  216  pp. 

_ _ _ ,  and  Parker,  E. ,  Information  retrieval  and  the  design  of 

more  intelligent  machines.  Final  Rept.  for  1  Jul  59  -  30  June  60, 

Task  E,  Contract  DA  36-039-SC-75047,  Moore  School  of  Elect.  Eng., 

Univ.  Penn.,  1960,  77  pp. 

Greanlae,  E.C.,  The  computer  in  medicine.  Datamation,  v.  11,  no.  12  : 

25-28,  1965 

Green,  M. ,  Design  factors  for  data  transmission  systems.  PR  7651,  Auerbach 
Corp.,  (1964  Interned.  Symposium  on  Global  Communications) ,  1964,  19  pp. 

Greenberger,  M. ,  The  uses  of  computers  in  organizations.  Scientific 

American,  v.  215,  no.  3  :  192-2C2,  1966 

Greenly,  J.F.,  Standardization  of  typewriter  fonts  for  automatic  reading. 
Rept.  for  U.S.  Air  Force  (RADC-TR-65-523) ,  General  Precision’s  Link 
Group,  Binghamton,  N.Y.,  1966,  53  pp. 

Griffith,  W.H.,  A  study  of  the  rationale  and  techniques  for  long-range 
technological  forecasting  in  the  biological  and  medical  sciences. 

Rept.  for  Life  Sciences  Div.  of  Army  Research  Office,  Contract 
DA-49-092-ARO-9,  Fed.  Amer.  Soc.  for  Exper.  Biol.,  Washington,  D.C., 

1964,  52  pp. 

Griffiths,  J.C.,  Statistical  approach  to  the  study  of  potential  oil 
reservoir  sandstones,  pp.  637-668  of  Parks,  G.A.,  editor  Computers 
in  the  mineral  industries,  Stanford  Univ.  Press,  Palo  Alto,  1964 

Hambleton,  W.W.,  New  dimensions  for  mineral  resources  studies.  Kan.  Geol. 
Surv.,  Spec.  Diet.  Pub.  31,  1966,  7  pp. 

Hammer,  C.,  Software  considerations  for  management  information  systems. 
Montreal  Chapter,  Data  Processing  Mgmt.  Assoc.,  Montreal,  Quebec, 

Canada;  (reprint  of  invited  paper  given  1  Jul  65  at  DPMA  Internatl. 

Data  Processing  Conf.,  Philadelphia),  1965,  29  pp. 


B  -  13 


MAPPING  OF  DISEASE 


Harbau gh ,  J .  W .  ,  Direct  printing  of  computer  maps  of  faclea  data  by  computer 
(abstr.),  Araer.  Assoc.  Petrol.  Geol.  Bull.,  v.  46  :  268,  1962 

_ .  Trend-surface  mapping  of  hydrodynamic  oil  traps  with  the 

IBM  7090/7094  computer.  Quart,  Colo.  Sch.  Mines,  v.  59  :  557-578,  1964 

_ _ __ _ _ ,  Mathematical  simulation  of  marine  sedimentation  with 

IBM  7090/7094  computers.  Kan,  Geol.  Surv.,  Comp.  Contr.  1,  1966,  52  pp. 

_ _ ,  and  Demirmen,  F.,  Application  of  factor  analysis  to 

petrologic  variations  of  Americus  Limestone  (Lower  Permian).  Kansas  and 
Oklahoma,  Kan.  Geol.  Surv.,  Spec.  Dist.  Publ.  15,  1254,  40  pp. 

_ _  ,  and  Preston,  F.W.,  Fourier  series  analysis  in  geology. 

Short  Course  and  Symp,  on  Computers  and  Comp.  Applications  in  Mining 
and  Exploration,  Univ.  of  Arizona,  v.  1  :  R~1  -  R-46,  1965 

_ ,  and  Wahlstedt,  W.J.,  FORTRAN  TV  program  for  mathematical 

simulation  of  marine  sedimentation  with  IBM  704Q  or  7094  computers, 

Kan.  Geol.  Surv.,  Comp.  Contr.  9,  1967,  40  pp. 

Harris,  J.N.,  and  Madle,  E.J.,  The  ACM-52  automatic  clutter  mapper  and 
preliminary  experimental  results.  Tech.  Rept.  206,  Contract  AF 
19(604)5200,  Lincoln  Lab.,  Mass.  Inst,  of  Tech.,  1959,  52  pp. 

Hastings,  A.D.,  Atlas  of  Arctic  Environment,  RER-33 ,  Regional  Environments 
Research  Branch,  U.S.  Army  Quartermaster  Research  and  Engineering 
Center,  Natick,  Mass.,  1961,  22  pp. 

Hathaway,  J.P.,  How  BUSHIPS  automatically  stores  and  retrieves  documents. 
Paper  presented  at  NARS  Symposium  on  Office  Information  Retrieval, 

1962,  6  pp. 

Hayes,  O.B.,  et  ai,  Computers  in  epidemiologic  dietary  studies,  J.  Amer. 
Diet.  Assn. ,  v.  44  :  456-460,  1964 

Head,  R.V.,  Management  information  systems  —  A  critical  appraisal. 
Datamation,  v.  13,  no.  5  :  22-27,  1967 

Helava,  U.V.,  A  family  of  photogramaetrlc  systems  (abstr.),  ACSM-ASP  1966 
Convention  Prog.,  p.  18,  1966 

Harshey,  A.V.,  The  plotting  of  maps  on  a  CRT  printer,  Rept.  1844,  U.S.  Naval 
Weapons  Lab.,  Dahlgren,  Va.,  1963,  79  pp. 

Hienz,  H.A.,  et  al ,  Zur  Frage  der  Dokumentatlon  in  der  pathologischen 
Anatomie .  Medizinisc.he  Dokumentatlon,  v.  5  :  10-12,  1961 

Hobson,  R.D - ,  FORTRAN  IV  programs  to  determine  surface  roughness  in 
topography  for  the  CPC  3400  computer,  Kan.  Geol.  Surv.,  Comp. 

Gentr.  14,  1967,  28  pp. 

nodes,  L.  ,  Machine  processing  of  line  drawings,  Rept.  (LL-54G-0028) , 

Contract  AF'  19(604)7400,  Lincoln  Lab.,  Mass.  Inst,  of  Tech.,  1961, 

15  pp. 


B  -  14 


Bibliography 


Hoffman,  J.,  Digitizing  bathymetric  data  aboard  ship  for  processing  by 
automation  (abstr.),  ACSM-ASP  1966  Convention  Prog.,  p.  23,  1966 

Hopps,  H.C.,  and  Gabrieli,  E.R.,  A  new  look  at  "normal"  values  in  clinical 
pathology  (editorial).  International  Pathology,  v.  9,  no.  1  :  10-11, 
1968 


Hopps,  H.  C.,  Data  versus  information  (editorial).  International  Pathology, 
v.  8,  no.  2  :  39-40,  1967 

Hoppr ,  H.  C.,  Information  —  A  problem  in  geographic  pathology  (editorial). 
International  Patholoc-v,  v.  8,  no.  1  :  14-15,  1467 

Horowitz,  A.S.,  Discussion  and  description  of  cataloging  system  for  the 
paleontological  collections  of  the  Indiana  Univ.  Dept,  of  Geology  and 
the  Ind.  Geological  Survey,  mimeographed  paper,  1964 

Hough,  P.V.,  General  purpose  visual  input  for  a  computer,  Ann.  N.Y.  Acad. 
Sci.,  v.  99  :  323-334,  1962 

Houston,  N.  and  Wall,  E. ,  The  distribution  of  term  usage  in  manipulative 
indexes,  Amer,  Documentation,  v.  15  :  105-114,  1964 

Humphrey,  P.S.,  An  ecological  survey  of  the.  Central  Pacific,  Smithsonian 
Year  1965  (Smiths,  Inst.,  Ann.  Rept.  for  year  ended  30  Jun  65), 
pp.  24-30,  1965 

Huntington,  E.  and  Shaw,  E.B.,  Principles  of  Human  Geography,  6th  ed., 

John  Wiley,  New  York,  1951 

International  Business  Machines  Corp.,  Numerical  code  for  states,  counties, 
and  cities  of  the  United  States.  Int.  Bus.  Mach.  Corp.,  New  York, 

1952,  81  pp. 

_ ,  Computer  Set  AN/GSQ-16 (XW-1) .  v.  1,2 

and  6,  Final  Rept.  (RADC-TR-59-110) ,  Contract  AF  30(602)1823,  Int.  Bus. 
Mach.  Corp.,  Yorktown  Heights,  N.  Y. ,  19j9 

_ ,  Proceedings  of  the  1st  and  2nd  IBM 

Medical  Symposia.  Int.  Bus.  Mach.  Corp.,  Yorktown  Heights,  N.  Y., 

1961,  427  pp. 


_ ,  Graphic  composing  techniques.  Final 

Rept.  (RADC  TDR  61-310),  Contract  AF  30(602)2527,  Thomas  J.  Watson 
Research  Center,  Int.  Bus.  Mach.  Corp.,  Yorktown  Heights,  N.Y.,  1962, 
67  pp. 


Symposium, 


Int.  Bus.  Mach. 


Corp. , 


1 

Proceedings  of  the  3rd  IBM  Medical 
Yorktown  Heights,  N.  Y.,  1962,  575  pp. 


_ ,  Proceedings  of  the  4th  IBM  Medical 

Symposium,  Int.  Bus.  Mach.  Corp.,  Yorktown  Heights,  N.Y.,  1962,  512  pp. 


_ ,  Proceedings  of  the  5th  IBM  Medical 

Symposium,  Int.  Bus.  Mach.  Corp.,  Yorktown  Heights,  N.Y.,  1963,  502  pp. 


B  -  15 


MAPPING  OF  DISEASE 


_ ,  Numerical  surface  techniques  and 

contour  map  plotting.  Publ.  E.  20-0117-0,  Int.  Bus.  Mach.  Corp., 

Yorfctown  Heights,  N.  Y. ,  1964 

_ ,  Proceedings  of  the  6th  IBM  Medical 

Symposium.  Int.  Bus.  Mach.  Corp.,  Yorktovn  Heights,  N.Y.,  1964,  653  pp. 

_ ,  Turning  time  ahead.  Computing  Report, 

v.  2,  no.  3  :  8-12,  1966  '  . * 

Jackson,  V.N.,  The  multipoint  system  of  digital  structural  analysis 
(abstr.),  ACSM-ASP  1966  Convention  Prog.,  p.  19,  1966 

Jahn,  T.L.,  The  use  of  computers  in  systematics.  J.  Parasit.,  v.  48  :  bob- 
663,  1962 

James,  W.R.,  FORTRAN  IV  program  using  double  Fourier  series  for  surface 
fitting  of  irregularly  spaced  data.  Kan.  Geol.  Surv.,  Comp.  Contr.  5, 
1966,  19  pp. 

Janaske,  P.C.,  editor,  Infonuation  handling  and  science  Information  —  a 
selected  bibliography  1957-1961,  Biological  Sciences  Conmunication 
Project,  Amer.  Inst,  of  Biol.  Sci.,  Washington,  D.C.,  1962,  approx. 

100  pp. 

Jenks,  G.F.  and  Brown,  D.A. ,  Three-dimensional  map  construction.  Science, 
v.  154  :  857-864,  1966 

Joint  Council  Subcommittee  on  Cerebrovascular  Disease,  Cerebrovascular 
Bibliography .  v.  5,  no.  2,  1965 

Journal  of  the  American  Medical  Association,  The  role  of  computers  in 

modern  medicine  (editorial),  J.  Amer.  Med.  Assoc.,  v.  196  :  196-197,  1966 

Jueneauum,  H.G.,  The  design  of  a  data-processlng  center  for  biological  data, 
Ann.  N.Y.  Acad.  Sci.,  v.  115  :  547-558,  1964  ~~  ' . 

Kaesler,  R.L.,  et^  aA ,  FORTRAN  II  program  for  coeff  lci»~..c  of  association 

(MATCH-COEFF)  using  an  IBM  1620  computer,  Kan.  Geol.  Surv.,  Spec.  Dist. 
Pub.  4,  1963,  9  pp. 

Kao,  R.C. ,  The  use  of  computers  in  th ?  processing  of  geographic  information. 
Geographical  Review,  v.  53  :  530-547,  1963 

Kaufman,  W.C.,  Standardization  of  symbols  and  units  for  environmental 
research.  AMKL-TR-66-115 ,  U.S.  Air  Force  Aerospace  Medical  Research 
Laboratories,  Wright  Patterson  A.F.B.,  Ohio,  1966,  4  pp. 

Kay,  M, ,  e_t  al,  The  Catalog  input /Output  System,  Memorandum  RM-4540-PR, 

Rand  Corp.,  Santa  Monica,  Calif.,  1966,  71  pp. 

Kellaway,  G.P.,  Map  projections,  Methuen,  London,  1949 

Kent,  A.f  and  Perry,  J.W.,  The  storage  and  retrieval  of  nornune rlcal  data 
in  large  and  complex  documentation  systems.  Tech.  Note  6,  Contract  AF 
49(638)357,  Center  for  Documentation  and  Conminicat ion  Research, 

Western  Reserve  Univ.,  27  pp. 


B  -  lb 


Bib  iicyraphy 


Keyser,  S.J.,  Advanced  language  processing  procedures,  ESD-TDR-63-620, 
Directorate  of  Computers,  U.S.  Air  Force  Systems  Command,  Bedford, 

Mass . ,  1963 ,  20  pp . 

Kiefer,  J.,  et_  al_.  Channels  with  arbitrarily  varying  channel  probability 
functions.  Information  and  Control,  v.  5  :  44-54,  1962 

King,  G.  ,  et  al_,  Automation  and  the  Library  of  Congress,  U.S.  Govt. 

Printing  Office,  Washington,  D.C.  ,  1964,  88  pp. 

Kleinmuntz,  B.,  Clinical  information  processing  —  Problem-solving 
strategies.  Datamation,  v.  11,  no.  12  :  41-49,  1965 

Klingbiel,  PH.,  Language  oriented  retrieval  systems.  Armed  Ser.ices 
Technical  Information  Agency,  Arlington,  Va. ,  1962,  100  pp. 

Korein,  J.  ,  c_t  al ,  Computer  processing  of  medical  data  by  variable  field 
length  format.  J.  Amer.  Med.  Assoc.,  v.  186  :  132-138,  1963 

_ ,  Computer  processing  of  medical  data  by  variable-field- 

length  format.  J,  Amer.  Med.  Assoc.,  v.  196  :  132-145,  1966 

Krumbein,  W.C.,  Trend  surface  analysis  of  contour-type  maps  with  irregular 
control-point  spacing.  J.  Geophys.  Rsch.,  v .  64  :  823-834,  1959 

_ .  Computer  analysis  of  stratlgrapnic  maps  (abstr.),  Amer. 

Assoc.  Petrol.  Geol.  Bull.,  v.  46  :  2/0,  1962 

_ ,  The  computer  in  geology:  Science,  v.  136  :  1087-1092,  196 

_ ,  FORTRAN  IV  computer  programs  for  Mark jv  cnain  experiments 

in  geology.  Kan.  Geol.  Surv.,  Comp.  Contr.  13,  1967,  38  pp. 

_ ,  and  Imbrie,  J.,  Stratigraphic  factor  maps,  Amer.  Assoc. 

Petrol.  Gecl.  Bull.,  v.  47  :  698-701,  1963 

_ ,  and  Sloss,  L.L.,  High-speed  digital  computers  in  strati¬ 
graphic  and  facies  analysis.  Amer.  Assoc.  Petrol.  Geol.  Bull.,  v.  42  : 
2650-2669,  1958 

_ _ ,  and _ ,  stratigraphy  and  Sedimentat  lor. ,  2nd 

ed..  Freeman,  San  Francisco,  1963,  660  pp . 

Lamson,  B.G.,  and  Dirsdaie.  8.,  A  natural  language  Information  retrieval 
system,  mimeographed  paper,  Pub.  Hlth.  Svc.,  Grant  HM  00300-01,  i'niv. 
Cal.,  Los  Angeles,  1966,  13  pp, 

Latham ,  J  .  P  .  ,  Possible  applic ations  or  electronic  scanning  and  c omt  <u t  e r 

devices  to  the  analysis  of  geographic  phenomena.  Tech.  Rent.  Contrac 
Nonr-55 1  (29) ,  Wharton  School  of  Finance  and  Comae  re  e,  I'niv .  Penn.,  195° 
27  pP. 

_ ,  A  study  of  the  application  o j  electronic  scanning  and 

coggxiter  devices  to  the  analysis  of  geographic  phenomena.  Final  Kept., 
Contract  Nonr-55 1  ( 29,' ,  Wharton  School  of  Finance  and  Commerce,  cniv. 
Penn . ,  1959 ,  6pp. 


a  -  l? 


'SEASE 


MAPPING  OF  DI 


Ledley,  R.S.,  and  Lusted,  L.B.,  Reasoning  foundations  of  med-lcal  diagnosis . 
Science,  v.  130  :  7-21,  1959 

__ _ ,  and  Ruudle,  F.H.,  Chromosome  alysis  by  computer. 

Scientific  American,  v.  214,  no.  4  :  40-48,  i96o 

Leeds,  H.D. ,  and  Weinberg,  G.M. ,  Computer  Programming  Fundamentals, 
McGraw-Hill,  New  York,  1961,  368  pp. 

^eibhclz,  S.W.,  Introduction  to  system  effectiveness  evaluation,  PR  7500-057, 
Auerbach  Corp.,  (presented  to  George  Washington  Univ.  School  of 
Engineering  4  Applied  Schience  Center  for  Measurement  Schience) ,  1965 

Levy,  W.A.,  Techniques  for  digital  representation  of  terrain  (absti. ), 
ACSM-ASP  1966  Convention  Prog.,  1965,  pp.25 

Lewis.  C.I.,  and  Langford,  C.H.,  Symbolic  Logic,  2nd  ed.,  Dover,  New  York, 
1959,  518  ~p.  . 

Lewis,  R.F.,  KW1C  —  Is  it  quick?.  Bull.  Med.  Libr.  Assoc.,  v.  52  :  142-147, 
1964  '  ~  . . 

Liberman,  E. ,  Descriptors  and  computer  codes  used  in  Naval  Ordnance 

Laboratory  Library  Retrieval  Program,  TR  64-20,  U.S.  N'aval  Ordnance  Lab., 
White  Oak,  Md.,  1964,  228  pp. 

_ _ ,  and  Stevens,  H.L.,  Tables  of  four-letter  computer  codes 

used  in  library  retrieval  program,  Rept.  NOLTR  62-50,  U.S.  Naval 
Ordnance  Lab.,  White  Oak,  Md.,  1962,  680  pp. 

Licht,  S.,  editor.  Medical  Climatology.  Elizabeth  Licht,  New  Haven.  Conn., 
1962,  753  pp. 

Light,  U.L.,  Ranger  mapping  by  ana* .tical  topographic  compilation  (abstr.), 
ACSM-ASP  1966  Convention  Prog.,  pp.  25-26,  1966 

Lipetz,  B.3.,  Information  storage  and  retrieval,  Scientific  American,  v. 

215,  no.  3  .  224-242,  1906  . 

Lipkin,  M.  ,  and  Woodbury,  M.A. ,  Coding  of  medical  case  history  data  for 
computer  analysis,  Comnunicat ions  of  the  ACM,  1962 

Livingstone,  F.C.,  Computer  diagnosis.  Science  News,  v.  91  :  558,  1967 

Loomis,  R.G.,  Boundary  networks,  Cossuunicat  ions  of  the  ACM,  v.  8  :  44-23, 

1963 

Losee,  F.L.,  Trace  element  variables  related  to  oral  health,  pp.  41-54  of 

Environmental  Variables  in  Oral  Disease,  Amer.  Assoc.  Advanc.  Sci  .  1966 

Lunin,  M.  ,  Coordinate  indexing  far  information  retrieval  in  an  oral 
pathology  department.  Oral  Surg.,  v.  18  :  484-493,  1964 

Maegraith,  B.  ,  Exotic  Diseases  in  Practice,  Will  f a»  Heineaann  Medical  Books, 
London,  1963,  pp.  3nl 


-  16 


B 


Bibliography 


Hanson,  V.,  and  Imbrie,  J.,  FORTRAN  program  for  factor  and  vector  analysis 
of  geologic  data  using  an  IBM  7090  or  7094/1401  computer  system,  Kan . 
Gaol.  Surv.,  Spec.  Disc.  Publ.  13,  1964 

Marden,  E.C.,  and  Roller,  H.R.,  Survey  of  computer  programs  for  chemical 

information  searching  Tech.  Note  85,  U.S.  National  Bureau  of  Standards, 
Washington,  D.  C. ,  1961,  87  pp. 

Mason,  E.E.,  and  Bulgren,  W.G.,  Computer  Ap'- ^lcatlons  in  Medicine, 

C.C.  Thomas,  Springfield,  Ill.,  1963 

Mazfield,  M.  ,  e_t  al,  editors,  Biophysics  and  Cybernetic  Systems,  Spartan 
Books,  Washington,  D.C.,  19*1 

Maxon,  R.O.,  Automation  in  program  management  (abstr.),  ACSM-ASP  1966 
Convention  Prog.,  p.  30,  1966 

May,  J.M.,  editor.  Studies  in  Disease  Ecology,  Hafner,  New  York,  19bl, 

6 13  p  p . 


McBroom,  P.,  Machines  cannot  think.  Science  News,  v.  90  :  6,  1966 

McCarthy,  J.,  Inf  oraati.n.  Scientific  American,  v.  215,  no.  3  :  ’-+  —  73  ,  1966 

McCormick,  B.H.,  et_  al_,  ILLIAC  III  --  A  processor  of  visual  Information, 
Rept.  183,  Dept,  of  Computer  Science,  Univ.  Ill.,  Urbana,  1965,  8  pp. 

_ _ ,  and  Richardson,  A.M.  ,  Design  concepts  for  an  information 

resource  center  with  option  of  an  attached  automated  laboratory ,  Rept. 
203,  Dept,  of  Computer  Science,  Univ.  Ill.,  Urbana,  19bb,  79  pp. 

McCracken,  D.D.,  A  Guide  to  FORTRAN  Programming,  John  Wiley,  New  Ycrk,  1961, 

8 7  pp. 

_ ,  A  Guide  to  ALGOL  Programming,  John  Wiley,  New  York,  1962, 

10c  pp. 


182 


pp. 


1965,  151  pp. 


A  Guide  to  Progranming,  John  Wiley,  New  York,  196), 

A  Guide  to  FORTRAN  IV  Programming.  John  Wiley,  New  York, 


_ .  £t  al.  Glossary  of  computer  terms,  preprint  tram  Programing, 

Business  Computers),  John  Wiley,  New  York,  1964 ,  24  pp. 

_ and  Dorr.,  W.S.,  Numerical  Methods  and  FORTRAN  Programming. 

John  Wilev,  New  York.  1M&4  ,  457  pp . 

Me  Cue,  O.A.,  and  iXi.’rie,  it,.!..  Imp-  ved  FORTRAN  IV  function  contouring 
program,  SID-c5-(-72,  Space  *tiu  information  Systems  Div.,  North 
American  Aviation  Ire..  1965,  31  pp. 

McEIroy,  M.N .  ,  and  Kaos  lor,  K.l..,  Applicat  on  or  i  actor  analysis  to  the 

Upper  Cambrian  Reagan  Sandstone  of  cent  ra.l  and  nortliwest  Ear.  ;as ,  Hie  j 

Compass,  v.  42  :  lad-201 ,  iv-*»5 


b  -  1 4 


MAPPING  OF  DISEASE 


McGlashan,  N.D.,  The  medical  geographers  work,  International  Pathology,  v.  7, 
no.  3  •  81-83,  1966  . . 

Mclntyie,  D.B.,  Trend-surface  analysis  of  noisy  data,  Kan.  Geoi.  Surv. , 
Computer  Contrib,  12,  pp.  45-36,  1967 

McLean,  J.D.,  Code  book  for  strata  data,  McLean  Paleontological  Laboratory, 
Alexandria,  Va.,  1962 

_ ,  An  application  of  electronic  data  processing  techniques 

to  paleontology  and  stratlgrapny ,  McLean  Paleontological  Laboratory, 
Alexandria,  Va. ,  1965,  1G  pp. 

_ Cumulative  index  to  card  catalogs  of  Foraminifera  and 

Ostr^.coda,  McLean  Paleontological  Laboratory,  Alexandria,  Va.  ,  1965 , 
approx.  50  pp. 

_ _ ,  Formats  and  procedures  for  use  in  data  processing  systems 

of  the  McLean  Paleontological  Laboratory.  McLean  Paleontological 
Laboratory,  Alexandria,  Va.,  1965,  7  pp. 

_ _ ,  Revision  and  new  procedures  for  the  stratigraphy  and 

ecology  files,  McLean  Paleontological  Laboratory,  Alexandria,  Va. ,  196S 
approx.  10  pp, 

_ _ ,  Resumd  or  .tats  for  Marine  Biology  Committee  of  the 

Marine  technology'  Soci^  McLean  Paleontological  Laboratory, 

Alexandria,  Va. ,  1965,  12  pp. 

Meaical  Tribune,  Computer  Is  aid  in  all  research  at  University  of  Utah, 
Medical  Tribune  and  Medical  News,  v.  7,  no.  54  :  20,  1966 

Mendelsohn,  M.L.,  et^  al,  Morphological  analysis  of.  cells  and  chromosomes 
by  dygitc1  computer,  Proc.  6th  IBM  Med.  Symp.,  pp.  409-416,  1964 

Merriam,  D.F.,  Geology  and  the  computer.  New  Scientist,  (20  May  1965 
issue),  pp.  513-516,  1965 

_ _ _ .  editor,  Computer  applications  in  the  earth  sciences  — 

Colloquium  ^  classification  procedures,  Kan.  Geol.  Surv.,  Comp. 

Contr.  7,  1966,  79  pp. 

_ Geologic  use  of  the  computer .  Symp.  on  Recently  Developed 

Geologic  Principles  and  Sedimentation  of  the  Permo-Pennsylvanian  of 
the  Rocky  Mountains,  20th  Annual  Conf . ,  Wyo.  Geol.  Assoc.,  pp.  109-112, 
1966 

_ ,  Computer  aids  exploration  geologists.  Oil  and  Gas  J., 

(23  January  1967  issue),  4  pp. ,  1967 

_ _ ,  and  Cocke,  N.C.,  editors.  Compute ,  applications  in  the 

earth  sciences  —  Colloquium  on  trend  analysis,  Kan.  Geol.  Surv., 

Comp.  Contr.  12,  1967,  62  pp. 


B  -  20 


Bibliogi'aphy 


_ f  and  Lippert,  R.H.,  Pattern  recognition  studies  of  geology 

structure  using  trend-surface  analysis.  Quart.  Colo.  Sch.  Mines,  v.  59  : 
237-245,  1964 

_ _ ,  and _ ,  Geologic  model  studies  using  trend- 

surface  analysis.  J.  Geci.,  v.  74  :  344-357,  1966 

_ ,  and  Sneath,  F.H.A.,  Quantitative  comparison  of  contour 

maps,  J.  Geophys.  Res.,  v.  71  :  1105-1115,  1966 

_ _ ,  and _ ,  Comparison  of  cyclic  rock  sequences 

using  cross-association.  Spec.  Publ,  2,  Dept,  of  Geology,  Univ.  of 
Kansas  pp.  523-538,  1967 

Merritt,  C.A.,  Serving  the  needs  of  the  information  retrieval  user.  Spring 
Joint  Computer  Conference,  AFIPS  Conf .  Free.,  v,  30  .  429-432,  1967 

Miesc.h,  A.T.  ,  Methods  of  computation  for  estimating  geochemical  abundance, 
U.S.  Geol.  Surv.,  Prof,  Pap.  574-B,  pp.  B1-B15,  1967 

_ .  Theory  of  error  In  geochemical  data.  U.S.  Geol,  Surv., 

Prof.  Pap.  574-A,  pp.  A1-A17,  1967 

_ ,  et_  a.1 ,  Investigation  of  geochemical  sampling  problems  by 

computer  simulation.  Quart.  Colo.  Sch.  Mines,  v.  59  :  131-148,  1964 

_ ,  and  Connor,  3.J.,  Investigation  of  sampling-error  effects 

in  geochemical  prospecting,  U.S.  Geol.  Surv.,  Prof.  Pap.  475-D, 
pp.  D84-D88,  1964 

_ ,  and  Eicher,  R.N.,  A  system  of  statistical  computer 

programs  for  geologic  research.  Quart.  Colo,  Sch.  Mines,  v.  59  :  259- 
286,  1964 

Miller,  A.E..  Data  transmission  —  The  total  systems  concept,  PR  7500-022, 
Auerbach  Corp, ,  (reprinted  from  Data  Systems  Design),  1964,  4  pp. 

Miller,  G.B.,  Production  and  quality  control  in  map  printing  at  U.S, 
Geological  Survey  (abstr.)  ACSM-ASP  1966  Convention  Prog.,  p.  33,  1966 

Miller,  R.L.,  and  Kahn,  J.S.,  Statistical  analysis  in  the  geological 
sciences ,  John  Wiley,  New  York,  1962,  483  pp, 

Minsky,  M.L.,  Artificial  intelligence.  Scientific  American,  v.  215,  no.  3  : 
246-260,  1966 

Monmonie r ,  M . S . ,  The  production  of  shaded  maps  on  the  digital  computer. 
Professional  Geographer,  v.  17,  no,  5  :  13-14,  1965 

Monroe  Internatl.  Inc.,  A  brief  on  FORTRAN  XI,  Publ.  MO-402,  Monroe 
Internatl.  Inc.,  Orange,  N.  J.,  1965,  4  pp. 

_ _ ,  A  brief  on  QUIKOMP,  Publ.  MO-401,  Monroe  Internatl.  Inc., 

Orange,  N.  J.,  1965,  4  pp. 


B  -  21 


MAPPING  OF  DISEASE 


Montgomery,  C.J.,  Computer  permits  simplified  field  surveying  methods 
(abstr.)j  ACSM-ASP  1966  Convention  Prog.,  p.  33,  1566 

Moors ,  G . P , ,  Statistical  analysis  and  functional  interpretation  of  neuronal 
spike  Hata,  Ana.  Rev.  Physiol. ,  v.  28  :  493-522,  1966 

Moser,  F. ,  A  computer  oriented  system  in  stratigraphic  analysis,  Inst. 
Science  and  Technology,  Univ.  Mich.,  1963 

Mullins,  L.S.,  Sources  of  information  on  medical  geography,  (reprint), 

PP.  230-242,  1967  . ~  “ 

Narasimhan,  R. ,  and  Fomangc,  J.P.,  Some  further  experiments  in  tin. 

parallel  processing  of  pictures.  File  No.  616,  Digital  Computer  Lab., 
Univ.  Ill.,  (Jrbana,  1964,  13  pp. 

National  Academy  of  Sciences,  Tropical  Health,  Publ.  996,  Div.  of  Med.  Sci. 
Natl.  Acad.  Sci. -Natl.  Res.  Counc. ,  Washington,  D.C.,  1962,  540  pp. 

National  Oceanographic  Date  Center,  Computer  programs  in  oeear-.ograpny , 

Publ.  C-5,  Natl.  Oceanogr.  Data  Ctr. ,  Washington,  E .  C  , ,  1964,  58  pp. 

_ ,  Instructions  for  coding  and  keypunching  the  geological 

information  form  for  core,  grab,  and  dredge  samples.  Pub..  M-5  (prov.), 
Natl.  Oceanogr.  Data  Ctr.,  Washington,  D.C. ,  1964,  41  pp. 

_ ,  Processing  physic..;!  and  chemical  data  from  oceanographic 

stations  —  Part  l,  coding  and  keypunching,  rev.  ed. ,  Publ,  M-2,  Natl. 
Oceanogr.  Data  Ctr.,  Washington,  D.C.,  1964 

_ ,  Manual  for  coding  and  keypunching  biological  data,  Publ, 

M-4  (prov.),  Natl.  Oceanogr,  Data  Ctr.,  Washington,  D.C.,  1965 

Navy  Publications  and  Printing  Service,  Ele ctr onography .  U.S.Navy  Pubx. 
and  Prntg.  S erv.,  1964,  8  pp. 

Nelson,  B.,  Machine  translation  —  Committee  skeptical  over  research 
support .  Science,  v.  155  :  58-59,  1967 

Nelson,  D.B.,  Pick,  R.A. ,  and  Andrews,  K.B. ,  GIM-l,  A  generalized  informa¬ 
tion  management  language  and  computer  system.  Spring  Joint  Computer 
Conference,  AFIPS  Conf .  Proc.,  v.  30  :  169-173,  1967 

Newman,  S. ,  Storage  and  retrieval  of  contents  of  technical  literature  — 
Nonchemlcal  information.  Res,  &  Dev,  Rept.  4,  U.S.  Patent  Office, 
Washington,  D.C.,  1957,  15  pp. 

Nooney,  G.C.,  Mathematical  models,  reality  .nd  results,  Proc.  5th  IBM  Med. 
Symp. ,  pp.  225-242,  1963 

Nordbeck,  S.,  Location  of  Areal  Data  for  Computer  Processing,  Gleerup, 

Lund ,  Sweden,  1962 

_ ,  and  Bengtsson,  B.,  Construction  of  isarithms  and 

Isarithmic  maps  by  computers,  Nordisk  Tidckrift  for  Informations- 
behandling,  v.  2,  1964 


B  -  22 


Bib liograpky 


O'Connor,  J. ,  Information  retrieval  by  UNIVAC  and  by  UNIV AC-produced  non- 
mechanized  system,  part  1,  Tech.  Rept.  18,  Contract  Nonr-2297 (00) , 

UNIVAC  Div. ,  Sperry  Rand  Corp.,  Philadelphia,  1957,  98  pp. 

Odoroff,  M.E.  ,  and  Abbe,  L.M.,  Use  of  general  hospitals  —  demographic 

and  ecologic  factors:  Puolic  Health  Repts.,  v.  72  :  pp.  397-403,  1957 

Oettinger,  A.G.,  The  uses  of  computers  in  science.  Scientific  American, 
v.  215,  no.  3  :  160-172,  1966 

Oliver,  L.H.,  et  al.  An  investigation  of  the  basic  processes  involved  in 
the  manual  indexing  of  scientific  documents.  Rept.,  Contract  NSF  C-422, 
General  Electric  Co.,  Bethesda,  Md.,  1966,  approx.  130  pp. 

Orrmya,  A.K.  ,  et  al.  A  system  of  coding  medical  data  for  punched-card 
machine  retrieval  as  applied  to  epilepsy.  Epilepsia,  v.  5  :  192-200, 

1964 

Overlings,  C.F.J.,  and  Harman,  R.J.,  1NTREX  —  Report  of  a  Planning  Con¬ 
ference  on  Information  Transfer  Experiments.  MIT  Press,  Mass.  Inst,  of 
Tech.,  Cambridge,  Mass.,  1965 

Panel  on  Information  Science  Technology,  First  Report  of  Panel  2  —  Informa¬ 
tion  Sciences  Technology.  Committee  on  Scientific  and  Technical 
Information.  Federal  Council  for  Science  and  Technology.  Working  Paper, 
U.S. Office  of  Science  and  Technology,  Wash! ugton,  D.C.,  1965,  9  pp. 

Parkins,  P.V.,  BioSciences  Information  Service  of  Biological  Abstracts  — 
Abstracting  and  indexing  provide  input  for  a  dynamic,  computer-based 
information  system.  (Preprint  of  draft),  1966 

Parks,  G.A. ,  editor,  Computer  in  the  mineral  industries.  Stanford  Univ. 
Press,  Stanford,  Calif.,  1964 

Patrick,  R.L.,  Wordbook  of  computer  programming  terms.  Planning  Research 
Corp.,  Los  Angeles,  Calif.,  1964,  65  pp. 

Patterson,  G.W. ,  et^  al,  What  is  a  code?.  Final  Rept.  for  1  May  58  - 

30  Jun  59  on  Project  ADAR  Task  F,  Contract  DA  36-Q39-sc-75047,  Moore 
School  of  Electrical  Engineering,  Univ.  Penn.,  1959,  43  pp. 

Paulus,  J.E.,  High-speed  system  for  handling  of  microlorm  documents. 

Mosier  Safe  Company,  Hamilton,  Ohio,  1966,  7  pp. 

Pearn,  W.C.,  Finding  the  ideal  cyclothem,  Kan.  Geol.  Surv.,  Bull.  169, 
v.  2  :  399-4.13,  1964 

Perkel,  D.H.,  A  digital-computer  model  of  nerve-cell  functioning. 

Memorandum  RM-4132-NIH,  Rand  Corp.,  Santa  Monica,  Calif.,  1964 

_ _ ,  Applications  of  a  digital  computer  simulation  of  a  neural 

network,  pp.  37-51  of  M,  d;ield,  M. ,  et_  al ,  editors.  Biophysics  & 
Cybernetic  Systems,  Spartan  Books,  Washington,  D.C. ,  1965 


B  -  23 


MAPPING  OF  DISEASE 


_ _ ,  Statistical  techniques  for  detecting  and  classifying 

neuronal  interactions ,  Proc.  of  Symp.  on  Information  Processing  in 
Sight  Sensory  Systems,  1965 

_ _ ,  e_t  al ,  Pacemaker  neurons  --  Effects  of  regularly  spaced 

synaptic  input.  Science,  v.  145  :  61-63,  1964 

_ _  ,  and  Moore,  G.P.,  A  defense  of  neural  model! ins.  Publ . 

P-3057,  Rand  Corp.,  Santa  Monica,  Calif,,  1965,  9  pp. 

Perring,  F.H.,  and  Walters,  S.M. ,  editors,  Atala  of  the  British  Flora. 
Botanical  Soc,  of  British  Isles,  publ.  by  Thomas  Nelson  &  Sons  Ltd,, 
London,  1962 

Perry,  K.E.  and  Aha,  E.J.,  The  Calliscope  —  A  versatile  alphanumeric 

display.  Tech.  Rept.  212,  Contract  AF  19(604)5200,  Lincoln  Lab.,  Maes. 
Inst,  of  Tech.,  1959,  18  pp. 

Peters,  B.s  Security  considerations  in  a  multiprogrammed  computer  system. 
Spring  Joint  Computer  Conference,  AFIPS  Conf .  Proc..  v,  30  :  283-286, 
1967 

Petersen,  H.W, ,  and  Turn,  R.,  System  implications  of  information  privacy. 
Spring  Joint  Computer  Conference,  AFIPS  Conf.  Proc.,  v.  30  :  291-300, 
1967 

Phillips,  W. ,  et  al.  Person-matching  by  electronic  methods.  Communications 
of  the  ACM,  v.  5  :  404-407,  1962 

Pierce,  J.R. ,  The  transmission  of  computer  date.  Scientific  American, 
v.  215,  no.  3  :  144-156,  1966 

Pierce,  J.W.,  and  Good,  D,l,t  FORTRAN  II  Program  for  standard-size 

analysis  of  unconsolidated  sediments  using  an  IBM  1620  computer.  K<*n . 
Geol.  Surv.,  Spec.  List.  Publ,  20,  1966,  19  pp. 

Pitts,  F.R.,  Chore logy  revisited  —  Computervlse .  Professional  Geographer, 
v.  14,  no.  6  :  pp,  8-12,  1962 

Preston,  F.W.,  et  al ,  The  use  of  statistical  communication  theory  for 
characterization  of  porous  media.  Computers  an.;  Operation  Research 
in  Mineral  Industries,  6th  Ann.  Symp.,  Perm.  State  Univ . ,  1966,  20  pp, 

_ _ ,  and  Henderson,  J , h  ,  Fourier  series  characterization  of 

cyclic  sertlinents  for  stratigraphic  correlation ,  Kan.  Geol.  Surv., 

Bull.  169*  v.  2  :  ”415-423",  1964 

Pritchard,  T.C. ,  Automating  the  engineering,  product.  Graphic  Science, 
v.  8,  no.  3  21-24,  1966 

Project  LEX,  POD  Manual  for  building  a  technical  thesaurus.  ONK-25,  Office 
of  Naval  Research,  Washington,  D.C.,  1966,  24  pp. 

Public  Health  Reports,  Electronic  data  processing  to  detect  hospital 
epidemic.  Public  Health  Repts. ,  v.  82  :  217-218,  1967 


i 


i 


► 


L  -  24 


i 


Hb  liogpaphy 


Rai sz ,  8«,  General  Cartography.  2nd  ed»,  McGraw-Hill,  New  York,  <948 

Ratynski ,  M.F  . »  The  Air  Force  computer  grogrga  acquisition  concept.  Spring 
Joint  Computer  Conference,  AFJ.PS  Coni.  Prec,,  v.  30  :  33-44,  7967 

Raup,  DM,,  Computer  as  aid  in  describing  form  in  gastropod  shells. 
Science,  v.  138  :  15C,  1962 


_ _  _ _ ,  and  Mi chela on,  A.,  Theoretic 

shell,  Science,  v.  147  :  1294-1295,  1965 


:he  coiled 


Read,  W. A. ,  Trend-surface  analysia  of  stratigraphic  thickness  data  from 
some  Nas&irian  rocks  east  of  Stirling,  Scotland.  Scottish  J.  of 
Geology,  v,  2  :  96-100,  1966 

Jeitman,  W.R.,  Information-processing  models  in  psychology  Science, 
v.  144  :  1192-119S,  1964 

Ren 8 ,  F ,  J . ,  FORTRAN  program  for  coordinate  mapping  using  IBM  7090  compete] 
Tech.  Rept.  10,  ONK  Task.  389-135,  Contract  Nonr~1228(26) ,  Dept,  of 
Geography,  Univ.  Mich,,  1965,  approx.  20  pp. 

Rentmeester,  L.F.,  Cybernetics  &..J  cartography  (abstr.),  ACSM-ASP  1966 
Convention  Progr.,  p.  39,  1968 

Re vu sky ,  S . K . ,  Some  statistical  treatments  compatible  with  individual 
organism  -methodology,  Rept,  716,  U.S.  Army  Medical  Research  Lab., 

Ft.  Knox,  Ky,,  1967,  22  pp, 

Reza,  F.M. ,  An  Introduction  to  Information  Theory.  McGraw-Hill,  New  York, 
1961,  496  pp. 

Rich,  W.H.,  and  Terry,  M.S-,  The  Industrial  control  chart  applied  to  the 
study  of  epidemics.  Public  Health  Rept.,  v.  6a  :  1501-1511,  1948 

Ringer tz,  N.,  Possible  Interrelationships  between  bovine  and  human 
leukemia  —  A  geographic  study.  International  Pathology,  v,  8, 
no.  2  :  30-31,  1967 

Roberts,  J.A.,  The  topographic  map  in  a  world  of  computers.  The 
Professional  Geographer,  v.  14,  no.  6  :  12-13,  1962 


Rogers,  D.J.,  and  Taniiaoto,  T.T. ,  A  cc 
Science,  v.  132  :  1115-1118,  1960 


J or  classifying  plants. 


Rogers,  F.A.,  1961,  ''’he  use  of  IBM  tabulating  methods  In  the  analysis  of 
medical  data,  Minnesota  Medicine,  v.  H  :  332-386,  1961 

Rosen,  C.A.,  Pattern  classification  by  adaptive  machines.  Science,  v.  156  : 
38-44,  1967 

Rosenfeld,  A.,  and  Pfaltz,  J.L.,  Sequential  operations  in  digital  picture 
processing.  Journal  of  the  ACM,  v.  13  :  471-494,  1966 


Ross ,  D. ,  The 


i_,  Datamation,  v.  13,  no.  5  :  II,  1967 


B  -  25 


MAPPING  OP  DISEASE 


Sabins,  F.F.,  Computer  flow  diagram  in  facies  analysis ,  Amer.  As3oc.  Petrol, 
Gee!.  Bull.,  v.  4 7  :  2045-2047,  1463 

Sac kin,  M.J.,  et  al,  ALCOL  Program  for  cross-association  of  nonnumeric 
sequences  using  a  medium-size  computer.  Kan.  Geol.  Surv . ,  Spec.  Dist. 
Publ.  23,  1965,  36  pp. 

Sampson,  R.J.,  and  Davis,  J.C.,  FORTRAN  II  trend-surface  program  with 

unrestricted  input  for  IBM  162G  computer.  Kan.  Geol.  Surv.,  Spec.  Dist. 
Pool.  26,  1966,  13  pp. 

_ _ _ _ ,  and  _ _  _ _  ,  Three-dimensional  response 

surface  program  in  FORTRAN  II  for  the  IBM  1620  computer.  Kan.  Geol. 
Surv.,  Comp.  Contr.  10,  1967,  20  pp. 

Sayer,  J.S.,  The  use  of  information  technology  in  research  and  development 
planning.  PR  7500-055,  Auerbach  Corp.,  Philadelphia,  1964 

_ _ _ _ ,  Usage  of  emerging  technology  in  program  execution. 

PR  7579,  Auerbach  Corp.,  Philadelphia,  1964,  22  pp. 

Scheele,  M. ,  Punch-card  methods  in  research  and  documentation  with  special 
reference  to  biology.  Lib.  Sci.  and  Doc.  no.  2,  Interscience 
Publishers,  New  York,  1962,  282  pp. 

Schlager,  C.W.,  Growing  importance  of  map  substitutes  (abstr.),  ACSM-ASP 
1966  Convention  Prog.,  p.  32,  1966 

Schmid,  C.F.,  and  MacCannell,  E.H.,  Basic  problems,  techniques,  and  theory 
of  isopleth  mapping.  J.  of  Amer.  Statistical  Assoc.,  v.  50,  1 955 

Schmitt,  O.H.,  and  Caceres,  C.A.,  editors,  Electronic  and  Computer- 
Assisted  Studies  of  Biomedical  Problems.  C.C.  Thomas,  Springfield, 

Ill.,  1964 

Schoenfeld,  R.I..,  The  role  of  a  digital  computer  as  a  biological  inst?  iment, 
Ann.  N.Y.  Acad.  Sci.,  v.  115  :  915-942,  1964 

Schultz,  C.K.,  et  ail.  Optimization  and  standardization  of  information 
retrieval  language  and  systems.  Tech.  Status  Rept.  1,  Contract  AF 
49(638)835,  Univac  Div.,  Sperry  Rand  Corp.,  Philadelphia,  1961,  50  pp. 

Science  News,  Machine  trans  at  ion.  Science  News,  v.  91  :  265,  1967 

_ _ ,  Ihrce-D  plotter,  Science  News,  v.  92  :  118,  1967 

Shafritz,  A.B.,  and  Rose,  K.,  Overflow  storage  in  a  store-and-f orward 
digital  storage  system.  Data  Systems  Engineering,  v.  19,  no.  1,  1964 

Siekmeier,  D.,  Apparatus  for  the  real-time  transmission  of  handwriting  and 
map  Information  to  remote  displays,  Rept.  2900-300-R,  Contract 
DA  36 -039-SC-78801,  Inst,  of  Sci.  and  Tech.,  Univ.  Mich.,  1962,  29  pp. 

Simpson,  M.H. ,  Cataloguing  and  retrieval  of  environmental  information  — 

A  statement  of  the  problem,  AFA  R1713,  Army  Frankford  Arsenal, 
Philadelphia,  1964,  32  pp. 


B  -  26 


Bib liography 


Sisson,  R.L.,  Computer  output  and  display  devices,  Ann.  N.Y.  Acad.  Sci., 
v.  115  :  627-643,  1964 

Skinner,  F.D.,  Computer  graphics  --  Where  are  we?  Datamation,  v.  12,  no.  5. 
pp.  28-31,  1966 

Smillie,  K.W  ,  Electronic  digital  computers  and  their  use  in  entomology. 
Entomol.  Sc  .  C-  uada,  Mem.  32,  pp.  11-15,  1963 

Smith,  W.A.,  Nature  and  detection  of  errors  in  production  data  collection. 
Spring  Joint  Computer  Conferenc'  ,  AFIPS  Conf .  Proc. ,  v.  30  :  425-428, 
1967 

Snipes,  D.S.,  and  Butler,  J.R.,  Digital  computer  program  for  identification 
£  minerals  by  X-ray  diffraction.  J.  Elisha  Mitchell  Sci.  Soc., 
v.  78  :  97,  19b2 

Sokal,  .  P  ,  and  Snea?i:h,  P.H.A.,  Principles  of  Numerical  Taxonomy, 

W.H.  F.eeman,  San  Francisco,  1963,  359  pp. 

Soper,  J.H.,  Mapping  the  distribution  of  plants  by  machine.  Canadian  J.  of 
Botany,  v  42  :  1087-1100,  1964 

Space,  .bs  Inc.,  X-1!  data  display  system,  NASA  Contractor  Rept.  C-R-460, 
contr-ct  NAS  4-589,  Spacelabs  Inc.,  Van  Nuys,  C  lif . ,  1966 

Speakman  E.D.,  Autome  ed  production  planning  and  control  (abstr.),  ACSM-ASP 
196^  Convention  F^og.,  p.  36,  1966 

Spitz,  O.T.,  Generation  of  orthogonal  polynomials  for  trend  surfacing  with 
a  digital  computer.  Computers  and  Operation  Research  in  Funeral 
Industries.  6th  Mo.  Symposium,  Penn,  State  Univ.,  1966,  6  pp. 

Stacy,  R.W. ,  and  Waxman,  B.D.,  Computers  in  Biomedical  Research,  Vcademic 
Press,  New  York,  vols.  1  and  2,  1965 

Stamp,  L.D.,  The  Geo-  raphy  of  Life  and  Death,  Fontana  Library,  Collins, 
London,  1964 

Statland,  N. ,  Me  hods  of  evaluating  computer  systems  performance.  Computers 
and  Automation,  v.  13,  no.  2,  1964 

_ ,  and  hillegass,  J.R.,  A  survey  of  computer  input-output 

equipment.  PR  7533  ,  Auerbach  Corp.,  (repr.  from  Data  Processing 
Yearbook),  1963(7),  20  pp. 

_ _ ,  and  _ _ _ ,  Random  access  storage  devices, 

PR  7556,  Auerbach  Corp.,  (reprint  from  Datamation),  1963,  9  pp. 

Steakiey,  J.E.,  Automated  color-separation  system  (abstj.),  ACSM-ASP  19o6 
onven'ion  Prog  ,  p  46,  1966 

Steinberg,  A.,  and  P.^rne,  L.W.,  Methods  and  techniques  ol~  uata  conversion, 
Ann.  N,Y.  Acad.  .  ci.,  v.  115  :  614-626,  1964 


B  -  27 


MAPPING  OF  DISEASE 


Sterling,  T.,  and  Pollack,  S.,  MEDCOMP  handbook  of  computes,  application 
in  biology  and  medicine  —  Part  1,  Statistical  systems.  Medical 
Computing  Center,  College  of  Medicine.  Univ.  Cincinnati,  196',  r'6  o, 

Strachey,  C.,  Syste,m  analysis  and  programming,  Scientific  American,  v .  21 
no.  3  :  112-124,  1966  “  .  *  "“ 

Sublette,  l.E. ,  Recognition  of  class  membership  by  means  of  weak, 

statistically  dependent  features,  AMRL-TR-66-174,  I’.S.  Air  Force 
Aerospace  Medical  Research  Lab.,  Wri'-ht-Patterson  A.F.B.,  Ohio, 

1966,  39  pp. 

Suppes,  F.,  The  uses  of  computers  in  education.  Scientific  American, 
v.  213,  no.  3  :  206-’ '20,  19  :6 

Sutherland,  I.E.,  Computer  gra^/il  s.  utamation,  v.  12,  no.  5  :  '2-27  19b, 

_ ,  Computer  inputs  an  outputs.  Scientific  American,  v.  "’IS, 

no.  3  :  86-96,  1966 

Switzer,  P.,  £t  al,  Statistical  ar.a  .  s<  ;  of  ocean  terrain  and  contour 
plotting  procedures.  A.D.  Little,  rmbridge,  1964 

latch,  D. ,  Automatic  encoding  of  medical  diagnoses .  Proc.  6th  IBM  Med. 
Symp.,  pp.  545-531,  1964 

Taube,  M. ,  Computers  and  common  sense  —  The  myth  of  thinking  machln*.  a. 
Columbia  Univ.  Press,  New  York,  1961 

Tewinkel,  G.C.,  Block  analytic  aerotrlangulation  (abstr.),  ACSM-AaP  1966 
Convention  Prog.,  o .  38,  1966 

Thcma,  J.  A.,  Simple  and  rapid  method  for  the  coding  ot  punched  ea.ds. 
Science,  v.  137  :  278-279,  1962  ~ 

Thompson,  E.T.,  and  Hayden,  A.C.,  Standard  Nomenclature  of  L.'.seas's  ai.l 
Operations,  5th  ed.,  McGraw-Hill,  New  York,  1961,  964  pp. 

Tjalma,  R.A. ,  et  al,  Clinical  records  systems  and  data  retrieve!  fu  'ctlon 
in  veterinary  medicine  --  A  proposal  for  systematic  data  programming , 

J.  Amer.  Vet.  Med.  Assn.,  v.  145  :  1189-1197,  1964 

Tobler,  W.R.,  Automation  and  cartography.  Geographical  Rev.,  v.  49  :  526- 
534,  1959  . .  '  " 

_ ,  Geographical  o -dering  of  informat  m.  The  Canadian 

Geographer,  v.  7,  no.  4  :  203-205,  1963 

_ _ ,  Automation  in  the  preparation  of  thematic  maps ,  The 

Cartographic  J.  (reprint,  7  pp.},  1964 

_ ,  Computation  of  the  correspondence  of  geographical 

patterns .  Papers  of  the  Regional  Science  Association,  , p .  131-139, 

1965 (? ) 


B  -  28 


Bibliography 


_ ,  Numerical  map  generalization ,  Univ.  Mich.,  Department  of 

Geography,  Michigan  Inter-University  Community  of  Mathematical 
Geographers  Discussion  Paper  8,  Part  1,  1966,  25  pp. 

_ ,  Spectral  analysis  of  spatial  series  (reprint,  1967,  8  pp. 

Tolies,  W.E.,  editor,  Computers  in  medicine  and  bio  'gy ,  Ann.  N.Y.  Acad. 
Sci.,  v.  115  :  543-1140,  1964 

Toomey,  D.F.,  Application  of  factor  analysis  to  a  facies  study  of  the 

Leavenvorth  Limestone  (Pennsylvanian-Virgllian)  of  Kansas  and  environs. 
Kan.  Geol.  Surv.,  Spec.  Dist.  Publ.  27,  1966,  26  pp. 

Turner,  A.H.,  editor,  Computers  in  medicine  bibliography,  Dept,  of 

Radiology  and  Medical  Center  Library,  School  of  Medicine,  Univ.  Miss., 
1965,  164  pp. 

_ ,  and  Schmidt,  D.A. ,  Computers  in  medicine  bibliography . 

rev.  ed.,  Dept,  of  Radiology  and  Medical  Center  Library,  Univ.  Miss., 
1966 

Univac  Division,  Electronic  data-processlng  for  the  line  official,  Publ. 

U  2448E,  Univac  Div. ,  Sperry  Rand  Corp. ,  Philadelphia,  1960,  86  pp. 

_ ,  Mighty  new  servant  tc  the  mind  of  man.  Publ.  U-4323, 

Univac  Div.,  Sperry  Rand  Corp.,  Philadelphia,  1964,  30  pp. 

Uni'ersi  y  of  Pittsburgh  Department  of  Geography,  Arthropod  distribution 

aps .  sponsored  by  U.S.  Army  Natick  Laboratories,  Natick,  Mass.,  1965- 
x  67 

Urban  Renewal  Service,  Using  computer  graphics  in  community  renewal  — 
t omputer  methods  of  graphing,  data  positioning  and  symbolic  mapping, 

|'R  msunity  Renewal  Program  Guide  1,  Urban  Renewal  Administration, 
Washington,  D.C.,  1963,  approx.  200  pp. 

U.S.  Geological  Survey,  National  Atlas,  (in  preparation) 

ailbona..  C.,  £t_  £l ,  System  for  processing  clinical  research  data,  Proc. 

6th  1UM  Med.  Svmp . ,  pp .  437-486,  1964 

vitro  Laboratories,  Plan  of  action  for  U.S.  Naval  Oceanographic  Office 
Library  ^tudy,  Rept.,  Contract  N62306-1828 ,  Vitro  Corp.  of  America, 
Silver  Sowing,  Md.,  1966,  approx.  20  pp. 

_ __ _ ,  System  performance  specification  for  U.S.  Naval  Oceano¬ 
graphic  Office  Library  study,  Rept.,  Contract  N62306-1828,  Vitro  Corp. 
f  America,  Silver  Spring,  Md.,  1966,  approx.  20  pp. 

_ _ ,  The  user  requ  1  rements  sped  f  i  c  at  ion  for  U.S.  Naval 

'V  eanographlc  Office  Library  study,  Rept.,  Contract  N62306-1828,  Vitro 
'>ip.  of  America,  Silver  Spring,  Md.,  i960,  approx.  20  pp. 

Vogel.  P. ,  An  inventory  of  geographic  rescar  ,i  of  the  humid  tropic 

err  onav  nt .  Contract  QA49-092-ARO-33  ,  exas  Instruments  Inc.,  Dallas, 
lex .  ,  ,  518  pp . 


B  -  29 


MAPPING  OF  '  IAEA.  =’ 


Waik°r,  A.,  editor.  Proceedings  of  the  Symposium  on  Development  and 

Management  of  a  Computer-Centered  Data  Base,  System  Development  Corp., 
Santa  Monica,  Calif . ,  1964,  133  pp. 

Walker,  A.R.P.,  Complexity  of  nutritional  problems  in  developing  countrie  , 
International  Pathology,  v,  8,  no.  4  :  71-73,  1967 

Wall.  E.,  The  distribution  of  term  usage  in  manipulative  ind>  .es,  America.: 
Documentation,  v.  15,  no.  2,  1964 

Wallace,  R.E. ,  Available  communi cat  ions  equipment  and  status  of  the  art, 

PR  7685,  Auerbach  Corp.,  Philadelphia,  1964,  11  pp. 

Warden.  J.,  CalComp  Plotter  Manual,  Informal  Manuscript  Rept .  Misc. -1-65 , 
U.S.  Naval  Oceanographic  Office,  Washington,  D.C.,  1965,  45  pp. 

Ware,  W.h.,  Security  and  privacy  in  computer  systems.  Spring  Joint  Computer 
Conference,  AFIPS  Conf.  Proc.  v.  30  :  279-282,  1967 

_ ,  Security  and  privacy  --  Similarities  and  differences. 

Spring  Joint  Computer  Conference,  AFIPS  Conf.  Proc.,  v,  30  :  287-290, 
1967 

warren,  H.V.,  Medical  gee  logy  and  geography.  Science,  v.  148  :  534-539,  1965 

Watson,  C. ,  Computer  generation  of  word  association  maps  for  man-machine 
communication ,  Publ.  SP-il'3,  System  Development  Corp.,  Santa  Monica, 
Calif . ,  1963,  24  pp. 

Watson,  D.E.,  et  al ,  Compilation  of  magnetic  charts  by  analytical  procedures 
utilizing  computer  techniques  (.abstr.),  .-.CSM-ASP  1966  Convention  Prog., 
p.  50,  1966 

Watson,  F.R.,  Coordination  of  data  and  compilation  of  forms  for  the  computer 
in  Proc.  Conf.  Consultants,  Guests,  a  Resident  Staff,  19-20  September 
1963  ,  Center  for  Zoonoses  Research,  Univ  Ill.,  Urbana,  J 963 

Webb,  G .  N .  ,  Communicating  biological  Information  to  the  1401  computer , 
coding,  editing,  and  Interfacing  --  Problems  and  results,  Proc.  6th 
IBM  Med.  Symp.,  pp.  69-80,  1964 

Weemuel rer ,  F. ,  Codeless  scanning  --  A  new  method  of  automatic  docuaentat ion 
[Germ.!  :  Experientia,  v.  1  :  38J-384,  I960 

Weik,  M.H.,  and  Confer,  V . J .  ,  Survey  of  scientific  and  technical  Inf ormat Ion 
retrieval  schemes  within  the  Department  of  the  Army,  BR1  Rep’.  69, 
Aberdeen  Proving  Ground,  Md  ,  1962,  93  pp. 

Werling,  R. ,  Action-oriented  information  systems.  Datamation,  v.  13,  no.  6  : 
37—65,  1967 

Wh i t e man ,  I . R . ,  The  role  of  computers  in  handling  aerospace  systems  human 
factors  task  data,  Rept.  AMRL-TK  65-206,  Compute t  Concepts  Inc..  1965, 

182  pp. 


b  -  30 


Bibliography 


whitfield,  H. ,  Application  of  a  toxcmomy  computer  programme  to  diabase 
c labflif Icat ion  (abstr.)>  Biometrics,  v .  19  :  368,  1963 

Whitlock,  L.S.,  Information  coding  and  retrieval  of  nematology  literature 
on  IBM  1620  computer  (abstr.),  Dissert.  Abst.,  v.  24  :  927,  1963 

Wolf,  M. ,  Computational  techniques  in  linguistic  geography,  (reprint),  1966 

Woodbury,  M.A. ,  Time  series  factor  analysis,  Proc.  2nd  IBM  Med.  Symp. , 
pp.  385-390,  1960 

World  Health  Organization,  Trends  in  the  study  of  morbidity  and  mortality. 
Public  Health  Paper  27,  1965,  196  pp. 

_ _ ,  Computers  in  medicine.  VJ.H.O.  Chronicle,  v.  21  :  100-111, 

JL96  7 

Wright,  K.T.,  Marienfeld,  C.J.,  and  Silberg,  S.L.,  "Place"  in  environmental 
epidemiology  of  rectangular  coordinate  method.  Public  Health  Reports, 
v.  83,  no.  5  :  427-434,  May  1968 

Wright ,  J.K.,  A  proposed  Atlas  of  Diseases.  Appendix  1,  Cartographic 
considerations.  Gecgr.  Rev.,  v.  34,  1944 

Yamaha,  S.  ,  and  Fornango,  J.P.,  Experimental  results  for  local  filtering 
f  digitized  pictures,  Rept.  184,  Dept,  of  Computer  Science,  Univ. 

131.,  Urbana,  1965,  44  pp. 

Yoder,  F.D.,  Data  processing  in  public  health.  Proc.  4th  IBM  Med.  Symp., 

pp.  ;  ’3-204,  1962 

Yoder,  R.i>.,  Tulane  Information  Processing  System,  version  1.  Monogr.  1, 
Computer  Science  Series,  Tulane  Univ.,  1965 

Zimmer,  H. ,  Preparing  psychoiphyslologic  analog  information  for  the 
digital  computer,  Behav.  Sci.,  v.  6  :  161-164,  1961 

Zubryn,  E,,  Electronic  herbarium.  Science  News,  v.  92  :  161,  1967 


A  recent  publication  which  became  available  too 
late  to  include  in  its  proper  (alphabetical)  place , 
but  that  is  too  important  to  omit  front  this  list¬ 
ing  is: 

Lindberg,  D.A.B.,  The  Computer  and  Medical  Care.  Charles  C.  Thomas, 
Springfield,  Ill.,  1968,  210  pp. 


Appendix 


The  Appendix  includes  the  following. 
Glossary 

—  Computer  processing  terms  A~2 


—  Biomedical  terns  A-6 

Vata  sources 

—  Narrative  and  tabular  A-lO 

—  Published  maps  A-16 

Schistosomiasis:  A- 25 

Leptospirosis:  A- 28 


A  -  i 


MAPPING  OF 


3EA3G 


This  monograph  incorporates  technical  information  derived  f  :om  several 
disciplines,  each  with  its  own  jargon.  In  the  interests  of  effective  com¬ 
munication,  some  of  these  terms  which  have  "special"  meaning  have  been 
selected  for  brief  explanation  here.  We  realize  full  well  that  epidemio¬ 
logic  terms  do  not  need  to  be  explained  to  the  epidemiologist,  nor  carto¬ 
graphic  cr.es  to  the  cartographer  —  but  the  cartographer  may  find  a  defi¬ 
nition  of  certain  epidemiologic  terms  helpful,  and  vice  versa.  Then,  too, 
some  of  the  terms  listed  here  i.ave  varied  meanings,  even  within  the  primary' 
discipline,  depending  up  n  who  uses  them  and  in  what  context.  We  have  tried 
to  be  precise  and  consistent  in  our  usage  of  these  terms,  adhering  to  the 
meanings  given  here. 

For  convenience,  the  glossary  is  divided  into  two  parts:  Part  one 
considers  data-processing  terms;  Part  two,  biomedical  terms. 

MOD  DATA  PROCESS  TNG  TERMS 


Block  Diagram  —  A  representation  of  spatial  cr  arcc.1  relationships  cn  the 
earth's  surface  that  is  drawn  obliquely  to  that,  surface  but  which, 
otherwise,  is  the  sar.o  as  a  map.  When  used  to  present  geologic  data, 
block  diagrams  usually  show  a  horizontal  surface  area  and  two  vertical 
cross-sections ,  but  when  used  to  present  disease  data,  the  two  vertical 
cross-sections,  often  unnecessary,  are  often  omitted. 

CEN  —  see  Computer  Evaluation  .Number. 

C-MOF  —  see  Common -MOF . 

Common~MQF  (C-MOF)  —  A  MOF  which  should,  and  usually  does,  accompany  (as  a 
necessary  descriptive  element),  or  should  be  common  to,  every  data 
point  or  bit  of  mappable  data.  In  the  MOD  system  only  six  C-MOF’s  are 
recognized  (see  p. 
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Computer  Evaluation  Number  (CEN)  —  A  number  calculate;  by  the  MOD  computer 
system,  accordlr  ,  to  an  appropriate  algorithm,  to  inulcate  the  rela¬ 
tive  reliability  of  each  data  point  that  is  input  to  the  zp' 

Data  Point  —  A  specific  geographic  locality  where  a  particular  factor/ 
aspect/facet  of  the  tocal  dlsectae/environraental  situation  has  been 
determined/observed/measured,  and  the  result/evaluation/vaiue  ex¬ 
pressed  in  some  qualitative/quantitative  forms.  In  manual  mapping 
procedures  a  data  point  is  represented  by  a  dot  at  the  location  of  the 
point,  with  a  symbol  beside  or  overprinted  on  the  dot  to  indicate  the 
value  of  the  data  point.  In  the  MOD  system  a  data  point  consists  of 
a  geographic  location  (LOC),  a  data-point  value  (VAl),  a  factor 
(HOF  or  POF),  and  narrative  (NAJR) . 

Disease  Map  ■ —  A  map  showing  some  aspect,  facet,  or  factor  of  the  total 
disease  situation  (ecology). 

Factor  —  Alphabetic  and/or  numeric  symbols  naming/describing  exactly 

what  part/aspect/facet  of  the  total  disease/environmental  situation 
is  being  evaluated  (i.e.,  given  a  VAL)  at  (the  LOC  of)  the  specific 
data  point.  Factor  is  a  general  term  that  includes  LOF's,  MOF's, 

HOF's,  and  POF’s,  and  is  one  of  the  three  essential  components  of  a 
data  point. 

Graph  Straight/curved  lines,  points,  and  wo  ’s/numbers,  all  representing 
numerical  data  which  express  the  relationship  among  specific  variables. 

High-Order  Factor  (EOF)  —  A  specific  combination  of  LOF's  in  which  each 
LOF  belongs  to  (is  drawn  from)  a  different  MOF;  i.e.,  a  specific 
combination  of  LOF's  to  which  no  MOF  contributes  more  than  one  LOF 
(see  p.  4-7). 

HOF  —  see  High-Order  Factor. 

_____  latitude. 

Latitude  (LA)  —  Angular  distance  along  earth's  surface  as  measured  north 
or  south  from  equator. 

LO  —  see  Longitude. 

LOC  —  see  Location  (Geographic). 

Location  (Geographic)  (LOC)  —  The  exact  geographi  position,  stated  as 
precisely  as  possible,  of  the  data  point.  Location  is  one  of  the 
three  essential  components  of  a  data  point  (i.e.,  each  bit  of  mappabie 
data) . 
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LQF  —  see  Low-Order  Factor. 

Longitude  (LO)  —  Angular  distance  along  earth's  surface  as  measured  east 
or  vest  from  Greenwich  meridian. 

Low-Order  Factor  (LQF)  —  The  most  specific  possible  name  or  description 
or  a  particular  disease/environmental  situation. 

Map  —  A  graphic/visual  presentation,  on  a  geographic-coordinate  basis,  ! 

of  the  information  imparted  by  a  particular  set  specific  data 
points.  A  map  is  a  representation  of  spatial  or  areal  relationships 
on  the  earth's  surface,  drawn  perpendicularly  to  that  surface  and 
according  to  a  rigorous  °Hd  pattern  and  scale  so  that  there  results 
no  ncnsystematic  distortion  of  size,  shape,  distance,  and  neighbors. 

A  map  is,  essenti  M, ,  u  three-variable  graph  in  which  X  -  LO,  Y  *  LA, 
and  Z  »  value  of  whatever  factor  is  being  mapped. 

Middle-Order  Factor  (MOF)  —  The  set  of  all  LOF's  which  describe  the  s^me 
aspect/facet  of  disease/environmental  situations.  (See  p. 

MOF  — •  see  Middle-Order  Factor. 

Multi-LOF  MOF  —  A  MOF  which  can  contain  more  than  one  IDF  for  each  data 
pclut.  Tor  example,  the  MOF  "Specific  Disease  Agent",  can  include 
several  LOF's:  "Leptospira  pomona,  L.  canicola,  and  L.  se1roe".all 
at  one  data  point. 

NAR  —  see  Narrative. 

Narrative  (NAR.  —  Supporting,  nonmappable  prose/narrative/textual  f 

information  or  data  associated  with  a  specific  data  point.  i 

O-HOF  —  see  Optional-MOF . 

Optiona^-MOF  (O-MDF)  —  A  MOF  which  need  r.ot  fit  into  every  possible 
ui»eaoe/env it onmeu cal  uata  point  and  whicn,  in  a  sense  then,  is 
optional.  This  category  includes  all  MOF ’a  except  Common-MOF ' s . 

POF  —  see  Poly-Order  Factor. 

Poly-Order  Factor  (POF)  —  A  specific  combination  of  LOF's  in  which  at 
least  two  LOF's  belong  to  (are  drawn  from)  the  same  MOF;  i.e.,  a 
specific  combination  of  LOF's,  to  which  at  least  one  MOF  contributes 
more  than  one  LOF.  (See  p. 

Primary  D  a  Point  —  A  data  point  extracted  from  text  that  originally 
reported  that  data,  i.e.,  from  ics  primary  source  document. 
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Qualitative  —  Expressed  or  denoted  by  alphabetic  symbols  or  words.  When 
so  expressed,  LOF's  (and  MOF's  containing  them)  and  VAL's  can  be 
termed  "qualitative". 

Quantitative  —  Expressed  or  denoted  by  numeric  symbols  or  numerals.  When 
so  expressed,  LOF's  (and  MOF's  containing  them)  and  VAL's  can  be 
termed  "quantitative". 

Report  (Hard-Copy  Report)  —  Printed  words  and/or  numbers  arranged  in 
listings,  tables,  or  narrative- like  prose. 

Secondary  Data  Point  —  A  data  point  extracted  Irom  text,  i,e.,  from  its 
secondary  source  document  referencing  a  previous  report  of  that  data, 

Sin^le-i^OF  MOF  —  A  MOF  which  can  contain  only  one  LOF  for  each  data  point. 

For  example,  the  MOF,  "Total  Annual  Rainfall",  can  include  only  one 
LOF,  e.g.,  "13  inches",  at  one  data  point. 

System  —  Used  in  two  senses  in  this  report,  but  differentiated  by  context: 
t.  "tOR  system,  consisting  of  the  personnel,  procedures, 
programs,  and  equipment  (Including  computer),  inte¬ 
grated  to  perform  mapping  of  dl  sease/environmen-  1  data: 

2.  MOP  computer  system,  consisting  only  of  the  vai  .  jus 
programs  and  equipment  mentioned  above. 

System  Analysis  —  Investigation  ^f  an  activity  or  procedure  to  determine 

what  that  activity /procedure  must  accomplish,  what  it  has  available  to 
it,  and  how  its  necessary  operations  may  best  be  accomplished  (either 
manually  or  t,  computer). 

System  Design  —  Planning  of  a  system  bv  specifying  the  characteristics, 

actions,  and  relationships  among  the  various  parts  (personnel,  programs, 
and  equipment)  of  the  system. 

System  implementation  —  Actual  construction  of  a  system,  including  pro¬ 
duction  of  programs,  installation  of  equipment,  hiring  and  training 
of  personnel  which  —  all  together  —  comprise  the  functional  system. 

System  Operation  - —  Operation  of  the  system  on  a  regular  production  basis 
in  which  the  personnel  utlliz*  the  system  procedures,  programs,  and 
equipment  to  accomplish  the  task  for  which  the  system  was  designed 
and  Implemented. 

VAL  —  see  Value  (for  Data  Point). 

Value  (for  Data  Point)  (VAL)  —  An  alphabetic  and/or  numeric  symbol  ex- 
preesing  the  precise  character/condition  of  that  aspect/factor  (of 

( 

i 
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the  disease/envlronmental  situation)  being  considered  at  (the  LOG  of)  the 
specific  data  point.  Value  is  one  ot  the  three  esc  .ntial  components  of  a 
data  point. 


BIOMEDICAL  TERMS 


Carriers  (of  disease)  are  infected  persons  who  harbor  an  infectious  agent 

and  who  are  capable  of  transmitting  this  agent,  hnt  who  have  no  obvi¬ 
ous  manifestations  of  the  di.  case. 

Contamln  qion:  See  infection. 

Ecology  is  the  study  of  relationships  between  living  organisms  and  their 
habitat  —  an  analysis  of  the  biodynamics  within  communities.  The 
ecology  of  disease  is  the  study  of  relationships  among  hosts,  disease 
agents,  and  their  environments . 

Endemic/enzootic  diseases  ate  those  which  are  present  in  a  given  community 
(human  beings/animals)  at  all  times,  but  at  a  low  level.  H/per- 
endemic/ny per enzootic  diseases  are  those  continuously  present  at  a 
high  rate  in  human  beings /animals . 

Epidemic/ epizootic  diseases  are  those  intermittently  present  at  a  high  rate 
in  a  (relatively)  small  area.  They  may  b*..  diseases  new  to  the  com¬ 
munity  or  diseases  that  were  continually  or  sporadically  present  at 
low  levels,  but  that  are  now  occurring  at  a  much  higher  rate  than 
usual.  The  suffix.  -Hemic,  relates  to  human  beings;  the  suffix 

Geographic  pathology  is,  in  a  sense,  a  kind  of  comparative  pathology  --  one 
in  which  place  (rather  than  species)  is  the  primary  variable.  It  is 
concerned  with  what  diseases  occur  where,  and  why.  Tt  is  also  con¬ 
cerned  with  the  reasons  why  the  "same"  disease  (in  terms  of  causative 
agent)  may  behave  quite  d'  "fere .it ly  in  different  parts  of  the  world. 

Host  is  an  animal  (or  a  plant)  which  harbors  an  infectious  agent.  The  host 
may  or  ms y  not  suffer  disease  as  a  resuit. 

Hyperendemic/hyperenzoorjc:  See  endemic/enzootic. 

Immunity  may  be  relative  or  absolute.  Absolute  immunity  protects  against 

disease;  relative  immunity  attenuates  the  disease.  Iasaunity  is  of  two 
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types:  innate  immunity,  relating  to  those  factors  inherent  in  the 
body  that  act  to  resist  an  infectious  agent  —  and  acquired  inmuni tv, 
a  state  of  increased  tesiatance  that  is  related  to  the  presence  of 
specific  (acquired)  antibodies  against  the  infectious  agent.  The 
specific  antibodies  can  come  about  "naturally",  i.e.,  as  a  conse¬ 
quence  of  infection,  or  they  can  be  produced  "artificially",  as  from 
vaccination. 

Incidence  and  prevalence  are  important  terms,  often  misused.  They  both  re¬ 
late  to  the  tista  during  or  at  which  a  disease  is  studied.  Events 
such  as  (clinical)  onset  of  the  disease,  or  birth,  or  death  occur  at 
a  precise  point  in  time,  but  various  disease  states,  e.g.,  lepto¬ 
spirosis,  or  schistosomiasis,  or  diabetes,  exist  over  varying  periods 
of  ti^e,  perhaps  yesr3.  Incidence  describes  the  number  of  evente 
(related  to  the  occurrence,  i.e.,  the  onset  of  a  disease)  which  took 
place  during  a  specified  time.  Prevalence,  on  the  other  hand,  refers 
to  the  number  of  cases  of  a  particular  disease  which  existed  —  at 
any  stage  —  at  (or  during)  a  particular  time  in  a  given  population. 

For  example:  the  incidence  of  leptospirosis  in  human  beings  in 
country  X  was  determined  as  127  cases  per  100,000  population  for  1964. 
The  figure  127  includes  (properly)  only  those  cases  that  began  during 
1964.  The  prevalence  of  leptospirosis  in  human  beings  in  country  X 
was  determined  as  147  per  100,000  population  for  the  year  1964.  (In 
this  hypothetical  study,  much  care  was  taken  not  to  count  the  same 
diseased  person  more  than  once.)  This  figure,  147,  includes  those 
cases  which  had  their  onset  before  1  January  1964,  but  which  persisted 
into  the  time  period  under  observation,  i  e.,  1  January  through 
December  31,  1964.  Prom  these  particular  incidence  and  prevalence 
figures,  one  could  infer  that  leptospirosis  was  a  disease  tnat  probably 
persisted  for  several  weeks,  since  27/147  cases  observed  In  one  year 
had  their  beginning  before  that  year.  The  more  chronic  the  disease, 
obviously,  the  greater  the  disparity  between  incidence  and  prevalence 
figures.  "Point  prevalence"  refers  to  "  •'iimw  ->f  cases  present 
during  s  very  short  ;  “led  ui'  observation.  (Short-term  field  surveys 
usually  determine  point  prevalence,  i.e.,  the  .umber  of  cases  —  at 
all  «r**oe«  —  present  at  the  time  that  the  particular  population  was 
examined.  "Period  pru^alence"  refers  to  the  number  of  cases  present 
during  a  (longer)  specified  period  of  observation. 

Infection  is  a  disease  state  resulting  from  an  (infectious)  agent  —  virus, 
bacteria,  spirochete,  yeast,  fungus,  or  (animal)  parasite  -  living 
in  the  host  and  producing  some  sort  of  defensive  reaction  by  the 
host.  The  infection  is  not  always  apparent  —  either  to  the  patient 
or  his  doctor.  Sometimes,  although  there  are  no  signs  or  symptoms, 
there  is  laboratory  evidence  of  the  reaction,  e.g.,  the  presence  of 
antibodies  specific  for  the  Infectious  agent.  In  this  latter  in¬ 
stance  the  infection  is  said  to  be  silent,  or  inapparent  or  (some¬ 
times)  "aubclinical”.  Infection  is  to  be  sharply  distinguished  from 
contamination .  a  situation  in  which  infectious  agents  may  be  "resting" 
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on  the  exterior  surfaces  of  the  body  or  upon  articles  of  clothing, 
etc.  The  term  contamination  also  applies  to  conditions  in  which  the 
infectious  agents  are  contained  within  soil  or  water  or  food. 

Infestation  is  ordinarily  applied  to  ectoparasites  and  describes  a  boat- 
parasite  relationship  in  which  the  parasite  lives  on  the  surface  of 
the  host.  In  some  Instances,  e.g.,  scabies,  the  parasite  may  tempo¬ 
rarily  invade  and  inhabit  the  superficial  tissues  of  the  host.  (It 
is  also  possible  fo;:  an  ectoparasite  to  live  on  such  'internal  sur¬ 
faces  as  the  intestinal  mucosa,  but,  again,  it  must  not  invade  the 
tissues  of  the  host,  otherwise  the  relationship  would  be  one  of  in¬ 
fection.  ) 

Morbidity  relates  to  the  (non-lethai)  manifestations  of  disease.  Morbidity 
rates  are,  in  essence,  "sick  rates". 

Mortality ,  in  relation  to  disease,  concerns  the  lethality  of  the  disease. 

As  a  rule,  mortality  rate  pertains  to  the  ratio  of  number  of  deaths 
from  a  given  disease  to  the  total  population  under  study.  See  "Rates". 

Pandemic/ panzootic  diseases  are  those  intermittently  present  at  a  high  rate 
over  a  very  large  area,  e.g.,  several  countries  —  or  diseases  con¬ 
tinuously  present  but  now  at  a  much  higher  rate  than  usual.  In  a 
seuoe,  a  pandemic  is  a  very  widespread  epidemic.  As  before,  the 
suffix,  -demic,  relates  to  human  beings,  tho  suffix,  -zootic,  relates 
to  animals. 

Parasite ,  in  its  broadest  sense,  includes  all  living  agents  that  live  in  or 
on  a  host,  deriving  benefit  from  the  host,  but  not  necessarily  pro¬ 
ducing  disease.  These  agents  include  viruses,  bacteria,  spirochetes, 
yeasts  and  fungi,  as  well  as  parasitic  agents  (in  the  narrow  sense), 
in  Its  restricted  meaning,  the  term  parasite  refers  only  to  ANIMAL 
agents;  viruses,  bacteria,  spirochetes,  and  yeasts  and  fungi  are  ex¬ 
cluded  . 

Pathogenicity  refer  to  the  capacity  of  an  infectious  agent  to  cause  disease 
in  a  susceptible  host 

Pathology  is  the  study  of  disease,  with  particular  concern  for  its  cause 

' ,  the  mechanisms  of  its  deve  lopoent  {pathojeme-’.  a) ,  and  the 
nature  of  its  effects,  especially  those  which  are  of  value  in  estab¬ 
lishing  specific  diagnosis. 

Portal  of  entry  refers  to  the  route  through  which  the  infectious  a^ent  enters 
the  body,  e.g.,  by  inhalation,  bv  Ingestion,  through  a  traumatic 
v->und,  injected  in  the  course  of  a  mosquito  Mte,  etc. 

rrevaience:  See  incidence. 
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Rates  is  an  expression  of  the  frequency  with  which  a  certain  event  or  circum¬ 
stance  occurs  in  relation  to  time.  There  are  many  kinds  of  rates: 
deatn  rates  ordinarily  relate  mortality  (often  from  a  specific  disease) 
to  the  entire  population  at  risk,  and  are  usually  expressed  as:  [deaths 
(per  year)  /  population]  X  1,000;  case  fatality  rates  describe  the 
mortality  from  a  specific  disease  as:  [number  of  deaths  from  the 
disease  /  total  number  of  cases  of  the  disease]  X  100. 

Reservoir  (o£  infection)  i»  closely  related  to  source,  but  differs  in  an  im¬ 
portant  respect.  It  is  a  place  within  a  host-parasite  system  where 
the  population  of  the  infectious  agent  is  maintained,  and  from  which 
a  vector  commonly  transmits  it  to  a  susceptible  host.  (It  Is  not 
necessary  that  the  Infectious  agent  multiply  in  a  reservoir.) 

Source  (of  infection)  is  a  place,  animate  or  inanimate,  where  the  infectious 
agent(s)  is  generated  (i.e.,  rrruitipliee) ,  and  from  where  it  may  be 
introduced  into  a  new  area. 

Sporadic  cases  of  disease  are  those  intermittently  present,  at  a  low  rate. 

Vector  la  an  object,  either  animate  or  inanimate,  that  transports  an  infec¬ 
tious  agent  fo  its  host.  Vectors  may  be  mechanical  or  biologic. 
Biologic  vectors  may  also  make  an  essential  contribution  to  the  growth 
and/or  development  of  the  parasite,  e.g.,  the  mosquito  in  malaria. 

(When  they  make  this  essential  contribution  they  are  called  inter¬ 
mediate  hosts. ) 


Virulence  is  a  term  somewhat  comparable  to  pathogenicity  but  it  pertains  to 
the  ability  of  the  organism  to  produce  severe  illness.  A  highly  viru¬ 
lent  agent  is  one  which  is  likely  to  produce  a  very  serious  Infection. 

Zoonoses  are  diseases  of  animals  that  may  be  transmitted  to  man. 


But  o'er  anxious  thought  you'll  find  of  no  avail, 
^or  there  precisely  where  ideas  fail, 

A  word  comes  opportunity  into  play 
host  admirable  weapons  words  are  found, 

On  words  a  system  we  securely  ground. 

In  words  we  can  conveniently  believe. 

No  ■  cf  a  single  jot  can  we  a  word  bereave." 

Johann  Wolfgang  von  Goethe 
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Data  sources 


Section  IV  has  considered  data  characteristics,  and  Section  V  data 
collection,  and  we  shall  not  repeat  here  the  things  that  were  discussed 
fhere.  This  brief  consideration  of  data  sources  —  vnere  to  find  them  and 
how  to  get  them  —  focuses  on  health/disease  data.  It  makes  no  attempt  to 
be  exhaustive;  the  particular  sources  cited  are  meant  to  serve  simply  as 
~  '  imples . 

BOOKS  are  enormously  valuable  data  sources,  but  often  of  mete  historic 
than  current  use,  and  this  applies  to  even  the  most  recent  publications 
(reflecting  the  time  lag  between  gathering  the  data,  converting  it  to 
manuscript,  and  getting  the  manuscript  published).  Nevertheless,  books 
such  as  Studies  in  Disease  Ecology,  e’ited  by  Jacques  M.  May,  Hafner,  New 
York,  1961,  and  Tropical  Health  —  A  Report  on  a  Stud,  of  Needs  and  Resources, 
Publication  996  of  the  National  Academy  of  Sciences  -  National  Research 
Council,  Washington,  1962,  can  be  extremely  useful. 

The  vast  numbers  of  PUBLISHED  ARTICLES  listea  in  the  Cumulative  Index 
Medicus  are  relatively  accessible,  but,  far  too  often,  the  title  of  the 
paper  does  not  reflect  some  of  the  crucially  important  data  that  it  contains. 
The  "demand  search"  function  of  the  MEDLARS  (Medical  Literature  Analyses  and 
Retrieval  System  —  National  Medical  Library,  Washington)  helps  to  overcome 
this  difficulty,  jut  to  only  a  limited  extent.  In  a  recent  report  (Jan. 

19b8) ,  Evaluation  of  the  MEDLARS  Demands  Search  Service,  it  was  stated  that: 
"...  the  system  is  operating,  on  the  average,  at  about  58%  recall  and  50% 
precision".  Furthermore,  MEDLARS  is  primarily  concerned  with  key  words,  not 
content,  per  se. 
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Another  approach  to  the  data  source  selection  problem  is  effectively 
usp'1  by  8 ucli  abstracting  services  as  Chemical  Abstracts  and  Biological 
Abstracts  (BioGciences  Inf omation  Service),  and  it  is  possible  tc  arrange 
for  special  services  with  such  organizations  as  these.  Under  these  con¬ 
ditions,  the  user  can  get  a  good  idea  of  the  article’s  content,  and  selec¬ 
tivity  becomes  progressively  more  precise  —  depending  upon  hew  much  one 
wishes  to  pay  for  this  precision. 

There  is  a  tremendous  amount  of  important  data,  the  sources  of  which 
are  not  included  in  Cumulative  Index  Medicus,  i.e.,  non-indexed  data 
sources,  and  it  is  a  major  problem  j^st  being  aware  that  some  of  these  re¬ 
ports  exist.  The  remainder  of  this  general  discussion  of  data  sources  will 
concern  this  category  of  information. 

★  *  J it 

There  are  three  principal  approaches  to  the  non-indexed  data  material. 
One  can  look  for  these  data  by:  (1)  geographic  area,  or  (2)  disease  or 
environmental  factors,  per  se,  or  (3)  a  primary  data  source,  e.g.,  a  spe¬ 
cific  research  institute,  a  specific  hospital,  a  specific  individual,  etc. 
Any  effort  to  build  a  data  base,  in  depth,  should  use  all  three  of  these 
approaches,  albeit  they  may  overlap  to  considerable  extant. 

As  to  GEOGRAPHIC  AREA,  the  local  government  is  an  important  first 
source.  In  many  of  the  so-called  developing  countries,  the  principal 
source  of  "official"  date  for  the  country  will  be  the  Ministry  of  Health, 
and  there  may  be  a  series  of  annual  reports  that  provide  valuable  current 
Information  as  well  as  data  that  allows  important  historic  perspective. 

In  "developed"  countries,  such  as  the  United  States,  there  are  many  many 
governmental  sources  of  data  pertaining  to  disease-environmental  situations 
of  that  country:  the  Bureau  of  Census,  the  ional  Institutes  of  Health, 
the  Communicable  Disease  Center,  the  United  States  Army,  to  name  but  a  few. 
The  various  State  Health  Departments  have  additional,  more  detailed  informa¬ 
tion  and,  finally,  there  may  be  still  uiore  precise  data  available  from 
specific  County  Health  units. 


A  -  11 


1 

1 

I 

i  i 

;  i 

I 

MAPPING  OF  DISEASE 

Important  non-governmental  sources  of  (local)  data  are  also  numerous: 

Universities,  Research  Institutes,  Organizations  such  as  the  American 
Medical  Association,  the  American  Cancer  Society,  the  New  York  State  Life 
Insurance  Company,  etc,  etc, 

i 

i 

Turning  to  sources  of  data  which  are  international  in  scope,  WHO,  ! 

PAKO,  tAO,  are  very  important  primary  sources  for  most  of  the  world.  j 

Coverage  may  be  very  broad  (geographically),  or  quite  restricted,  e.g.,  j 

WHO's  report.  Studies  on  Immunoglobulins  of  Nigerians,  1966.  Often  the  i 

reports  concentrate  on  a  particular  disease  or  condition,  e.g.,  WHO's  ■ 

I 

Malaria  Yearbooks  and  PAHO ' s  Immunologic  Aspects  of  Parasitic  Infection,  J 

1967.  In  addition  to  the  official  international  organizations,  those  j 

individual  governments  that  have  had  a  long  interest,  in  international  j 

affairs  are  rich  sources  of  information  dealing  with  ether  countries.  In 
the  United  States,  for  example,  much  information  is  available  from  the 
National  Institutes  of  Health,  the  Department  of  State  (especially  AID, 
and  the  Bureau  of  Intelligence  and  Research) ,  the  Communicable  Disease 
Center,  and  the  Department  of  Defense  (consider  this  report,  for  example). 

Periodic  reports  from  DOD,  or  Army,  or  Navy,  or  Air  Force  units  —  and  other 
Governmental  agencies  arc  valuable  data  sources  and  we  list:  below  illustra-  J 

live  examples  of  these: 

406 tli  Medical  Laboratory  Professional  Report  (annual), 

United  States  Army  Medical  Command,  Japan. 

Annual  Progress  Report,  SEATO  Medical  Research  Laboratory 
Clinical  Research  Center,  Bangkok,  Thailand. 

Annual  Work  Unit  Progress  Report  from  the  various 
Naval  Medical  Research  Units  (NAMRU’s),  e.g.,  Serologic , 

Epidemiologic ,  and  Vaaci-ne  Studies  on  Meningococcal 
Meningitis  (report  of  a  study  carried  out  in  Egypt  and 
Morroco) . 

Annual  Research  Project  Report,  Armed  Forces  Institute 

of  Pathology  (AFIp) ,  Washington.  1 

Annual  Reports  of  the  U’.S.Army  Medical  Research  Unit's  I 

Institute  for  Medical  Research  at  Kuala  Lampur,  Malaysia.  | 
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Annual  Progress  Reports ,  U.S. Army  Research  Institute 
of  Environmental  Medicine,  Natick,  Mass,  (of  the  U.S. 

Army  Medical  Research  and  Development  Command). 

National  Commni cable  Disease  Center's  Morbidity  and 
Mortality  Reports  and  (sporadic.)  Surveillance  Reports. 

Forest  Service  Research  Papers,  e . g , ,  Weather  in  the 
Luguillo  Mountains  of  Puerto  Rico  (250  pages),  by 
C.B.Bresco  —  The  Institute  of  Tropical  Forestry, 

Forest  Service,  U.S. Department  of  Agriculture 

SPECIFIC  PRIMARY  DATA  SOURCES  car:  be  very  valuable,  for  example,  the 
Rockefeller  Institute,  Johns  Hopkins  University  (the  Geographic  Epidemi¬ 
ology  Unit  of  the  School  of  Hygiene  and  Public  Health),  the  Institute  of 
Public  Health  of  Iran,  the  Liverpool  School  of  Tropical  Medicine,  the 
Waiter  Reed  Army  Institute  of  Research,  the  Technical  Assistance  Informa¬ 
tion  Clearing  House,  American  Council  of  Voluntary  Agencies  (44  E,  23rd 
Stree,  New  York,  N.  Y.  10010),  etc.  Several  illustrations  of  data  avail¬ 
able  from  these  sources  follow: 

Annual  Report  on  the  Research  Activities  of  the  Liberian 
Institute  of  the  American  Foundation  for  Tropical  Medicine . 

Tulane  University  (New  Orleans) /Unlversidad  Del  va lie 
(Cali,  Colombia)  periodic  Progress  Reports  —  an  f.CMRT  (NIH) 
supported  program. 

Relation  of  Geology  and  Trace  elements  to  Nutrition  (based 
on  papers  presented  at  a  Symposium  held  at  the  Annual 
Meeting  wf  the  Geological  Society  of  America,  New  York, 

1963),  edited  by  H.L. Cannon  and  D.F. Davidson. 

Proceedings  of  the  7th  International  Congress  of  Tropical 
Medicine  and  Malaria  (four  volumes),  Rio  de  Janeiro, 

Sept.  T-lT,  1963 

The  Physical  Environments  and  Agriculture  of  Thailand ,  by 
M.Y.Nattonson,  a  publication  of  the  American  Institute  of 
Crop  Ecology,  Washington,  1963. 

Then  there  are  collections  of  more  general  data,  some  of  which  are 
reissued  periodically,  bringing  the  information  up-to-date.  For  example: 
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Health  Data  Publications,  pertaining  to  individual 
countries  (some  44  have  been  released) ,  prepared  by 
the  Department  of  Health  Data,  Division  of  Preventive 
Medicine,  The  Walter  Reed  Army  Institute  of  Research, 

Special  Warfare  Area  Handbooks  for  various  countries, 
prepared  by  Foreign  Areas  Studies  Division,  Special 
Operations  Research  Office  of  the  American  University, 

Washington  (operating  under  contract  with  the  Depart¬ 
ment  of  the  Army) . 

j.S.Anry  Area  Handbooks  for  various  countries 
(Department  of  the  Army  pamphlets). 

The  World  rood  Problem,  a  two  volume  report  of  the 
President's  Science  Advisory  Committee,  1967. 

These  many  valuable  reports,  are  not  "lost"  even  though  they  are  not 
indexed  in  the  Quarterly  Cumulative  Index.  There  is  a  variety  of  other 
indices  available  —  if  one  knows  where  to  find  them.  For  example: 

U.S. Government  Research  and  Development  Reports,  a  semi¬ 
monthly  abstract  journal  produced  by  the  Clearing  House 
for  Federal  and  Technical  Information  of  the  U.S. Depart¬ 
ment  of  Commerce. 

Air  Force  Scientific  Research  Bibliography  (abstracts  of 
all  USAF  Office  of  Scientific  Research  Supported  Research 
Projects) . 

ILSE  —  Interagency  Life  sciences  Supporting  Research  and 
Technology  Exchange,  prepared  by  Documentation,  Inc., 

(abstracts  NASA  and  DOD  Research  Work-Units  in  Life 
Sciences) . 

Pesticides  Documentation  Bulletin.  National  Agriculture 
Library,  U.S. Department  of  Agriculture. 

But  there  are  many  potentially  valuable  publications  that  escape  the  usual 
indexing  mechanisms,  and  these  may  be  of  crucial  value  in  connection  with 
certain  geographic  areas.  For  example: 

The  South  African  Institute  for  Medical  Research  — 

Annual  Reports 

ConbuhuoOo  Ao  Eetudo  da  Patologia  Das  Arbovvi ruses 
by  Domingos  De  Paola  (91  pages),  Rio  de  Janeiro,  1964, 
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(A  thesis  presented  in  partial  fulfillment  of 
requirements  to  attain  Professorial  status 
(docencia-livrej ,  privately  published  and 
distributed  by  the  author.  Particularly  in  Latin 
America,  many  (medical)  doctoral  and  professorial 

theses  .  such  as  this  one  —  represent  hidden 

sources  of  very  valuable  local  disease-environmental 
data. 

Many  SPECIFIC  INDIVIDUALS  could  be  mentioned  vl  e  store  of  informa¬ 
tion  and,  often,  even  more  important,  their  knowledge  of  where  to  find 
particular  information,  could  be  given.  As  an  example,  it  seems  appro¬ 
priate  to  mention  the  single  individual  who  has  been  most  helpful  to  this 
project  In  providing  information  about  leptospirosis:  Dr.  Aaron  D. 
Alexander  (Walter  Reed  Army  Institute  of  Research) .  In  accumulating  in¬ 
formation  in  depth  —  and  in  valuating  the  depth  of  information  coverage 
one  has  achieved  —  there  is  no  substitute  for  wise,  skillful,  informed 
consultants  who,  as  a  rule,  not  only  help  to  identify  defects  in  the  data 
base,  but  give  sound  advice  as  to  how  they  can  be  corrected. 

Language  can  be  an  important  barrier,  particularly  when  the  data 
Is  reported  in  a  language  with  which  few  are  familiar,  and  there  is  no 
questie..  hut  that  many  valuable  data  are  lost  to  general  use  because  of 
this.  English  translations  are  available  for  some  of  the  larger  more 
important  reports,  for  example:  Natural  loci  of  Transmissible  Diseases 
as  Related  to  Territorial  Epidemiology  of  Zooanthroponoses  —  USSR,  a 
254  page  English  translation  prepared  under  contract  for  the  Joint  Publi¬ 
cations  Research  Service,  clearing  house  for  Federal  Scientific  and  Tech¬ 
nical  Information,  U.S. Department  of  Conmerce.  But  such  "free"  transla¬ 
tions  servLce  only  (partially)  covers  a  rather  highly  selective  subject 
area. 

*  *  * 

ihe  leptospirosis  source  documents  used  in  this  study  were  located 
by  a  combination  of  genial,  comprehensive  surveillance  of  epidemiologic 
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literature  and  personal  contacts  with  leptospirai  workers.  In  addition 
to  searching  standard  bibliographies  and  abstract  journals  for  thes, 
documents,  LTC  Watson's  group  obtained  machine-computer-produced  biblio¬ 
graphic  lists  from  such  agencies  as  the  National  Library  of  Medicine,  the 
Defense  Documentation  Center,  the  Army  Research  Office,  and  the  Military 
Entomological  Information  Service.  The  World  Health  Organization  and  the 
Pan  American  Health  Organization  were  also  used  extensively  as  data 
sources. 


PUBLISHED  MAPS 


In  ordinary  usage,  MOD  generated  mapped  disease  and  environmental  data 
would  be  directly  related  to  (overlaid  on)  ba.o  maps  showing  topographic 
features,  geologic  characteristics,  land  usage,  population  density,  etc., 
etc.  For  this  reason  such  maps  represent  a  very  important  part  of  MOD  data 
even  though  the  data  contained  therein  will  not  (as  a  rule)  be  subjected  to 
computer  processing. 

As  a  part  of  our  data  collection  study  we  have  carried  out  a  pilot  in¬ 
vestigation  to  determine  availability  and  source  of  (potential) base  maps  -- 
and  their  usefulness.  This  study  was  undertaken  th  the  help  of  Dr.  R. 
Warwick  Armstrong,  Assistant  Professor  of  Geography,  the  University  of  Ill. 

Dr.  Armstrong  has  been  a  valuable  consultant  medical  geographer  on  the  MOD 
project  since  its  inception,  and  it  was  through  his  cooperation  that  Mr 
Gary  G.  Gullett,  graduate  student  of  the  Department  of  Geography,  w  em¬ 
ployed  part  time  to  evaluate  source  material  in  the  form  of  published  maps, 
floristic  atlases,  etc.,  which  might  serve  as  base  maps  or  otherwise  con¬ 
tribute  in  a,i  important  way  co  the  data  base  of  MOD  system 

Mr.  Gullett  made  this  survey  under  the  guiduuce  of  Dr.  Armstrong,  uti¬ 
lizing  fhe  extensive  libraries  of  the  University  of  Illinois,  at  Urbana 
libraries  .which  include  one  of  the  largest  collections  of  "srtographf c/geo¬ 
graphic  data  in  the  United  States.  Mr.  Gullett's  final  report,  which  fol¬ 
lows,  Is  presented  In  its  entirety. 

in  addition  (following  Mr.  Gullett's  report)  there  is  listed  the  major 
primary  sources  of  maps  which  were  found  to  be  most  useful  to  the  MOO  project. 


A  -  16 


Appendix 


1.  Purpose  The  purpose  of  Lhis  report  is  to  indicate  the  location,  qual¬ 
ity,  and  quantity  of  specific  maps  in  the  various  libraries  of  the  Univer¬ 
sity  cf  Ill*nois,  Urbana,  Illinois,  which  would  serve  «s  a  data  source  for 
the  MOD  project. 


2,  Scope  The  scope  of  this  study  includes  the  searching,  assessment,  and 
transmittal  of  cartographical  material  concerned  with  the  following  catego¬ 
ries: 


(1)  climate 

(2)  vegetation  patterns  and  associations 

(3)  ecological  studies 

(4)  soil  distributions  and  associations 

(5)  entomology 

(6)  zoology 

(7)  lorestry  and  forest  associations 

(8)  land  use  and  land  use  patterns 

(9)  hydrology 

(10)  general  topography 


The  searching  for,  and  assessment,  and  transmittal  of  the  above  cate¬ 
gories  was  pursued  with  reference  only  to  the  following  geographical  areas, 
and  at  scales  ranging  from  1:25,000  to  1:10,000,000: 

(1)  World  (entire) 

(2)  World  (raaj  or  section,  continent,  ocean) 

(3)  Southeast  Asia 

(4)  Thailand 

(5)  Burma 

(b)  French  Xndo— Chins  ,  .  ,  .  continued 
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(7)  Philippines 

(8)  Malaya 

(9)  The  Midwest  of  the  U.S. 

(10)  Illinois  (entire) 

(11)  Southern  Illinois  (Quadri-County  area) 

3.  Method  of  Collecting  Data  Information  was  obtained  exclusively  by 
personal  investigation  of  the  cartographical  holdings  of  the  several  depart¬ 
mental  libraries  at  the  University  of  Illinois.  Every  appropriate  map  was 
checked  out  of  the  libraries  and  the  necessary  information  transcribed  onto 
data- col lection  forms  and,  where  appropriate,  the  legends  reproduced  by 
xerox.  Where  there  were  a  series  of  map?,  such  as  the  1:1,000,000  coverage 
of  the  world,  only  one  map  in  the  series  was  selected,  and  its  representa¬ 
tive  legenu  and  inr.erent  information  transcribed.  In  addition,  a  notation 
was  made,  marking  the  number  of  maps  in  that  series,  their  extent  of  cov¬ 
erage,  and  their  call  numbers.  All  necessary  information  was  transmitted 

to  the  MOD  project  headquarters  in  Washington,  D.C.  on  data-collection 
forms,  and  the  xeroxed  legend  attached.  A  copy  of  the  data-co 1  lection  form 
is  attached  to  this  report. 

4.  Results  The  hap  and  Geography  Library  was  by  far  the  greatest  source 
of  maps  for  this  project.  Over  140  different  maps  were  obtained  from  this 
library,  and  many  of  these  140  "maps"  comprised  a  series  of  sheets.  The 
total  number  of  applicable  sheets  was  approximately  1,400.  The  quality 

of  these  maps  is  high  and  they  ace  relatively  up-to-date.  They  deal,  In 
one  way  or  another,  with  all  the  previously  mentioned  categories  and  cover 
most  of  the  regions  listed.  Those  maps  which  have  a  large  series  of  sheets 
associated  with  them  are  designated  below  by  their  call  numbers,  as  found 
iti  the  hap  and  Geography  Library.  The  number  of  associated  sheets  it  also 
given. 

Call  Number  Number  of 

sheets 

Southeast  Asia  G 8000s  174 

250 
.  U5 


.  .  continued 
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Call  Number 


Philippines 


G8060s 
250 
.  U5 

G8060s 
.  C5 
1963 
.  P4 

G8062 

50 

•  U51 

G8062s 
.  C45 
25 
.  U5 

G8062s 

•  L8 
50 

.  U  5 

G8062 
.  M5 
25 

•  U5 


French  Indo-China  G8010s 

50 

.15 

G8000 

100 

.15 


G 80 I Os 
100 
,C7 


Number  of 
sheets 

60 

8 

39 

116 

2  IS 

175 

83 

236 

5 

*  •  •  •  continued 
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Ma laya 


Burma 


Call  Number 


Number  of 


>  . 

sheets 

G8010s 

40C 

.13 

24 

G8030s 

63 

.G7 

68 

G8013 
.  V5 

500 

.U5 

i 

12 

G7720s 

250 
•  U5 

45 

G7720 

253 

,C7 

63 

i 

The  second  most  Important  source  of  maps  was  in  the  Map  Library  of  the 
Department  of  Geography  located  in  Davenport  Hall.  This  library  (main¬ 
tained  ;nd  managed  by  a  graduate  student)  yielded  68  maps.  These  maps  are 
both  regional  and  topical  in  nature.  They  are  all  single  sheet  maps;  no 
series  of  sheets  are  associated  with  any  of  them.  Unlike  the  !  i..e  quality 
and  physical  condition  of  'he  maps  in  the  Map  and  Geography  Library  col¬ 
lection,  a  number  of  these  maps  are  quite  old,  show  crude  cartographical 
techniques,  and,  in  some  instances,  are  rather  worn.  All  of  these  maps 
are  wall  maps,  i.e.,  they  have  a  wooden  frame  at  the  top  and  bottom,  and 
the  majority  are  large  (the  smallest  ones  being  30"  x  40").  Tactically 
every  topical  map  covered  the  whole  world;  26  maps  dealt  with  specif  ic 
regions.  Southeast  Asia  was  well  represented  in  this  Collect-  >n. 


Tne  number  of  appropriate  maps  possessed  by  the  faculty  . ;  the  Depart¬ 
ment  of  Geography  at  the  University  was  minimal.  Professor  R.  W.  Armstrong 
was  the  only  iaculty  member  with  applicable  maps,  and  the  number  of  differ¬ 
ent  sources  was  less  than  ten.  The  maps  dealt  with  climate  and  vegetation 
on  a  world-wide  scale;  there  were  no  regional  maps  available. 
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Search  of  a  number  of  other  departm  ntal  libraries  did  no',  produce  a 
t.’rge  quant  it"  of  suitable  maps  for  this  project.  A  list  of  the  searched 
lib  varies  includes:  Agriculture,  3iology ,  Chemistry ,  Geology ,  Veter inc^y 
Medicine,  State  Natural  History,  State  Water  Survey,  and  ITie  State  Geo¬ 
logic  Survey.  A  search  at  the  Agronomy  Department  office  it.  Turner  Hall 
yielded  a  number  of  high-quality  soil  association  maps  of  the  Quadri- Counts 


area.  it  was  discovered  that  a  number  of  these  departments  have  large 


titles  of  textual  material  and  literature  available  that  would  be  helpful  i 
tiie  MOD  project,  but,  as  stated,  there  were  relatively  few  applicable  maps. 
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Search  of  a  number  of  other  departmental  libraries  did  not  produce  a 
' arge  quuntit"  of  suitable  maps  for  this  project.  A  list  of  the  searched 
lib-aries  includes:  Agriculture,  3iology ,  Chemistry,  Geology,  Veter ina’-y 
Medicine ,  State  Natural  History,  State  Water  Survey,  and  The  State  Geo¬ 
logic  Survey.  A  search  at  the  Agronomy  Department  office  ir.  Turner  Hall 
yielded  a  number  of  high-quality  soil  association  maps  of  the  Quadri-Cou  lty 
area.  it  was  discc.  .eied  that  a  number  of  these  departments  have  large  quan¬ 
tities  of  textual  material  and  literature  available  that  would  be  helpful  in 
t he  MOD  project,  but.,  as  stated,  there  were  relatively  few  applicable  maps. 


5.  Cone lus  ion  All  appllcab’e  maps  c i  the  site  and  quality  needed  for 
this  pro  ject  have  been  sought  out ,  their  information  recorded,  xeroxed,  and 
sent  ,6  the  MOD  project  hen iquar ter s  in  Washlr  gton,  D.C.  All  sources  of 
maps  1  sve  been  checked,  inc lu  mg  different  national  atlases.  The  only  pos¬ 
sible  remaining  source  of  cart  graphical  material  would  be  traps  contained  in 
monographs,  but  they  are  of  questionable  value  for  the  project.  Most  of 
these  maps  would  be  not  larger  than  prge  sice,  and  the  amount  of  detail  and 
degree  of  coverage  would,  of  ■.  ur  e,  be  limited. 

Most  of  the  maps  (especially  those  with  a  number  of  associated  sheets) 
j.n  the  Map  and  Geography  Library  would  be  well-suited  for  detailed  plotting 
of  data;  those  maps  found  in  the  '’.eoeraphy  Li!  rary  of  Davenport  Hall  would 
be  well-suited  for  plotting  groa  lata. 


continued  next  page 


ENVIRONMENTAL-FACTOR  MAP  INFORMATION  FORM 


(mark  all 

applicable 

boxes) 


1.  Scope  of  map:  World  (entire) 

S.E.  Asia  (entire) 

Mid-western 

World  (major 

Thailand 

Illinois  (entire) 

section;  conti- 

Malaya 

Southern  Illinois 

neat,  ocean) 

Other 

(including  Quadri-Co. 

specify:  ) 

(specify:  ) 

area) , 

2.  EXACT  title  of  map- 

(i.e.,  name  of  env; ronmental  factor 

mapped) 

3.  Break-down  of  values  of  environmental  factor  mapped  - 
XEROX  LEGEND  OF  MAP  AND  STAPLE  u.  THIS  SHEET. 


4.  Method  of  representing  data  on  map: 

dot- type  symbols  shading/patterns  (b ' ack-white) 

alphabetic/numeric  symbols  shading/patterns  (colors) 

contour-type  lines  o*-'  >r  (specif-): _ 

5.  Projection  used: 

equirectangular  (plane-chart) 

Miller  cylindrical 

Mercator 

hon.olosine 

Other  (specify: _ _ _ , 

also  note  whether: 

longicude  meridians  are  straight  or  curved  lines 
latitude  paralels  are  straight  or  curved  lines 
projection  is  interrupted,  interrupted  and  condensed, 
condensed,  or  non- interrupted  and  non-condensed 


j 

{ 


I 


6.  Scale  of  map:  1/ 


Dimensions  of  map: 

(cross  out  non-applic  ble  units)  (width) _ 

cm. 

_ X  (height) _ 

cm. 

in. 

8.  Date  of  publication  of  map: 

— — . 

4.  Date(s)  of  data  sapped: 

(earliest) 

( latest) 

10.  Data  compiled  by  (if  different  from  item  1 1) 

— — - — - 

A  -  22 


«vrmv "WWW*  “*!* 


li.  Bibliographic  reference  for  map:  boo!-  author,  dace,  title  of  book., 
Publisher  and  city,  page:  journal--author ,  date,  title  of  article. 
Name  of  jou.nal,  volume,  (number),  page. 


12. 

Call 

number 

of 

source  containing 

map : 

13. 

Gd  l  1 

number 

of 

map  (if  different 

from  item  i2) : 

- - 

\ 

\ 

j 


i 


| 


i 

t 

l 

I 

\ 


14.  Physical  location  of  nup  (or  source  containing  map) : 

Univ.  of  Illinois  main  library 
Univ,  of  Illinois  map  library 
Faculty  member's  personal  library 

(specify  whose: _ _ _ ) 

AFIP  Ash  Library 

AFIP  Geographic  Path.  -  Geographic  Zoon.  library 
Staff  member's  personal  library 

(specify  whose _ _ _ _ _ _ ) 

Other  location  ( specify) ;  _ _ ___ _ 

15.  Additional  remarks  (use  re-verse  side  if  necessary): 


! 

! 


! 


i 

f 


J 

f 

j  16.  This  form  was  completed  on: 

:  (Date)  _ _  _  (Name) 


A  -  23 


The  major  primary  sources  of  maps  which  were  found 
to  be  most  useful  to  the  MOD  project  are  as  follows 
(.arranged  in  alphabetical  order): 


Aeronautical  Chart  and  Information  Center  (U.S.A.F.) 

American  Geographical  Society 

Coast  and  Geodetic  Service  (Dept,  of  Commerce) 

Earth  Science  Division  of  the  U . S ,  Army  Natick.  Laboratories 
(various  atlases  dealing  with  climate,  insect  vectors,  etc.) 

F.A.O.  (especially  ir.  relation  to  crop  ecology) 

Geological  Survey  (Dept,  cf  Interior)  ] 

i 

! 

National  Geographic  Society  i 

U.  S.  Army  Map  Service,  Corps  or  Engineers  ' 

I 

U.S.  Navy  Hydrographic  ! 

I 

U.S.  Naval  Oceanographic  Office 


j 

i 
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Schistosomiasis  represents  a  group  of  diseases  caused  by  any  one  of 
three  blood  flukes  (jS.  mansoni  —  the  one  with  which  we  have  been  primarily 
concerned  in  our  mapping  program  —  haematobium,  and  Sh  laponicum) ,  Each 
of  these  specific  organisms  has  its  own  (geographic)  distribution  pattern, 
although  they  overlap. 

As  is  shown  in  Figure  A-2  (taken  from  Pathology  of  Tropical  Diseases, 
by  Ash  and  Spitz),  these  organisms  have  a  rather  complex  life  cycle,  and 
each  requires  an  intermediate  host  —  a  particular  species  of  snail  —  in 
order  to  fulfill  a  vital  stage  of  their  development.  The  Infected  human 
being  releases  (excretes)  eggs  either  his  feces  or  urine.  If  these 
eggs  are  discharged  into  fresh  water,  they  hatch  as  free-swimming  organisms 
(miricidia) .  Then,  if  they  are  able  to  reach  the  appropriate  species  of 
snail  within  a  short  time,  they  infect  the  snail  (producing  disease)  and 
develop  into  a  larval  stage.  At  this  stage  they  break  out  of  the  snail  to 
become  a  free-swimming  form  once  again  (cercariae) .  If  man  is  exposed  to 
infested  water  for  only  a  moment  or  two,  the  organism  can  actively  force 
entry  through  his  sxin  and  into  cutaneous  blood  capillaries  or  venules. 

Once  in  the  blood  stream  they  are  carried  to  their  particular  (organ)  site 
of  preference  for  further  development.  S..  mansoni  and  jS.  japonicuro  prefer 
the  veins  which  supply  the  colon  —  and  the  colon  and  rectum  are  primarily 
involved;  S.  haematobium  concentrates  in  the  blood  vessels  of  the  urinary 
bladder  —  and  the  bladder  is  the  primary  site  of  disease.  Once  the  larval 
organisms  reach  their  maturity,  they  begin  to  lay  eggs,  and  it  is  these 
eggs  which  incite  a  marked  chronic  inflammatory  reaction.  Major  areas  of 
damage  (depending  upon  the  type  of  organism)  are  the  colon,  the  liver,  the 
urinary  bladder  (and,  secondarily,  the  kidneys). 

In  most  Instances  (most  patients  do  not  receive  adequate  treatment), 
the  disease  is  a  very  very  chronic  one,  lasting  for  many  years,  it  is  apt 
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to  become  continually  worse  because  continual  re-exposure  and  re-infection 
('‘super-infection'1)  adds  continually  to  the  number  of  infecting  parasites. 

Schistosomiasis  is  a  very  important  cause  of  death,  especially  among 
infants.  It  is  an  even  greater  cause  of  morbidity,  and  it  depletes  the 
energy  of  millions  of  people  in  the  world,  with  very  profound  socio¬ 
economic  consequences..  Because  the  life  cycle  of  these  organisms  are 
dependent  upon  an  intermediate  host  which,  in  turn,  requires  a  suitable 
kind  of  fresh  water,  etc.,  etc.)  this  is  one  of  the  diseases  which  can 
easily  be  introduced  into  a  new  area  —  or  eliminated  from  an  old  area  — 
by  changing  the  ecology. 
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BLOOD  FLUKES 


RCARIAE  (inter  unbroken  SKIN 
producing 

SWIMMER'S  ITCH" 


•P\  uiotrmiA 


SPOROCYSTS  ^  ^  ^ 


Buiinut  or 
Physopsis 


Oncomeionio 


or  secondary 
SPOROCYSTS 


Planorbis 


CERCARIAE 
produced  within 

SPOROCYSTS 


Figure  A-l  Phyllis  Smith,  i944. 

Frcrr,  Pathology  of  Tropical  Die  ases  by  Ash,  J.E.  and  Spitz ,  S. 
Saunders  Co.,  Philadelphia,  1940;  AFIP  neg.  $00-18419. 
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Leptospirosis  —  general  considerations 

Leptospirosis  (known  by  many  ether  names,  including  Weil’s  disease) 
is  an  important  acute  infectious  disease  of  man  and  animals  (i.e,,  a 
zoonoses)  that  is  worldwide  in  its  distribution.  The  disease  was  first 
recognized  in  human  beings  in  1886.  It  varys  greatly  in  its  manifestations 
(as  shown  in  the  adjacent  figure).  The  severe  disease,  in  man,  affects 
many  organ  systems,  particularly  the  liver  and  the  kidneys.  Occasionally 
the  disease  is  fatal  (in  man) ,  causing  death  within  nine  to  twelve  days 

SUBCLINICAL  DISEASE 

no  overt  manifestations 

(but  laboratory  evidence  of  disease } 
e.g.j  positive  serologic  tests) 

SLIGHT  CLINICAL  DISEASE 

non-opecific  manifestations 
(e.g.,  FUO  —  fev?r  of  undetermined 
origin) 

MODERATE  clinical  disease 

with  or  without  characteristic  signs 
and  symptoms 

SEVERE  CLIMCAL  DISEASE 
possibly  fatal 

The  causative  organism  belongs  to  the  genus  Leptospira  of  the  family 
Treponemataceae .  Although  there  is  only  one  species  pathogenic  for  man 
(L.  interrogans) .  there  are  many  many  "serotypes",  virtually  identical  in 
form,  but  differentiated  on  the  basia  of  their  antigenic  characteristics. 
Since  each  of  the  various  serotypes  has  its  own  peculiar  ecologic  character¬ 
istics,  it  is  important  (epidemiologically)  to  determine  the  specific  sero¬ 
type  responsible  for  a  given  infection. 

The  leptospiral  organism  is  usually  transmitteu  to  man  by  food  or  drink 
that  has  been  contaminated  by  the  urine  and/or  excreta  of  rats  or  other 
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rodents,  or  from  immersion  in  water  which  has  been  contaminated  by  seme 
animal  reservoir.  Workers  in  sewers,  irrigation  ditches,  rice  fields,  docks, 
and  abattoirs,  are  at  high  risk. 

Clinically,  (in  the  severe  case),  the  onset  is  sudden  with  high  fever, 
headache,  and  generalized  body  pains.  Depending  upon  the  severity,  renal 
and  hepatic  involvement  may  lead  to  uremia  and  Jaundice.  Ordinarily,  the 
disease  runs  a  course  of  three  to  four  weeks  in  human  beings,  with  a  mor¬ 
tality  rate  of  about  5%  (of  the  clinically  recognized  cases) . 

Leptospirosis  was  choc.  ;n  as  the  principal  disease  to  study  by  the 
MOD  project  team  because  there  are  many  known  animal  reservoirs  (see 
Figure  A-l) ,  the  amount  and  nature  of  surface  water  is  an  important  eco- 
logic  factor  in  maintaining  sources  of  infection,  and  persons  of  certain 
vocations  are  at  high  risk,  etc.  etc.  Furthermore,  there  is  still  a  great 
deal  to  be  learned  about  leptospirosis.  Two  of  the  (13)  recommendations 
for  research  in  a  recent  (1967)  Repeat  of  a  WHO  Expert  Group  were  as 
follows: 

•  Further  studies  of  the  ecology  of  reservoir  hosts 
are  necessary ,  particularly  in  regions  where 
environmental  conditions  are  rendered  unstable  by 
human  activity. 

•  Results  of  ecological  studies  should  be  applied  to 
the  forecasting  of  cycles  of  infection  and  to 
systematic  survet l lance  programmes  on  which 
prevention  of  outbreaks  in  man  could  be  based. 


I 

i 


MAPPING  OF  DISEASE 


f  opossums  \ 

/  raccoons  \  120* 
j  muskrats  ]  ^nown 
l  laitria  /"reservoirs” 
\  skunks  / 


goats  \  \2±  known 
$he*p  I  "hosts” 
horses  / 


<3  *  10* 
org./ml  urine 


The  biologic 
reservoirs 


The  "free” 
organism 


The  susceptible 
{important)  hosts 


Figure  A-l_  A  schema  showing  relationships  among  man  and  animals  * — 
and  their  environment  —  as  they  pertain  to  the  ecology  of  leptospirosis . 
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